Compositions and methods for high-level, large-scale production of recombinant proteins

ABSTRACT

Compositions and methods for the high-level, large-scale production of recombinant proteins are disclosed. Illustrative compositions comprise one or more expression vectors capable of high-level protein and/or polypeptide expression in combination with an immortalized host cell-line capable of growth in serum-free, suspension culture. Bi-directional UCOE vectors that permit the simultaneous, high-level expression of two or more recombinant proteins and/or polypeptides from a single UCOE based plasmid vector.

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application is related to U.S. Provisional Application No.60/352,404 filed Jan. 29, 2002, U.S. Provisional Application No.60/333,620 filed Nov. 26, 2001, and U.S. Provisional Application No.60/295,961 filed Jun. 4, 2001, which are hereby incorporated in theirentirety by reference.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates generally to gene expression andprotein production and, more specifically, to compositions and methodsfor the overexpression of recombinant proteins. Such compositions andmethods are useful in the high-level, large-scale production ofrecombinant proteins.

[0004] 2. Description of Related Art

[0005] A major goal of the biotechnology industry is the development ofstable cell-line based systems for the large-scale expression ofrecombinant proteins such as, e.g., recombinant antibodies. Standardmethodologies require time consuming and labor intensive development ofsuitable recombinant host cell-lines. Conventionally, cells, such as,e.g., CHO-K1 or CHO DUX, are grown in the presence of fetal bovine serumand transfected by the expression vector of interest. The entirepopulation of cells subsequently undergoes a process of selection toremove cells that failed to take up the expression vector. The vectorcontaining pool is then, typically, subcloned and screened forhigh-level expression. Each of the resulting high-level expressingclones is then expanded and slowly adapted to serum-free, suspensionculture which adaptation often results in the loss of expression of therecombinant protein and/or polypeptide.

[0006] In addition to these general limitations in recombinant proteinexpression, efficient functional expression of multi-subunit proteins,such as, e.g., antibodies, requires appropriately balanced expression ofboth subunit chains. For example, traditional methodologies for theexpression of antibody heavy and light chains rely on theco-transfection of plasmids independently carrying a heavy and lightchain coding region makes the maintenance of an equal copy numberdifficult and provides the potential for transcriptional interferencebetween the genes if the vectors integrate close to one another in thegenome.

[0007] Thus, in spite of considerable research, there remains a need inthe art for improved compositions and methods for high-level,large-scale expression of recombinant proteins and/or polypeptidesincluding antibody heavy and light chains. The present inventionfulfills these needs and further provides other related advantages byutilizing host cell-lines that are pre-adapted for serum-free,suspension culture in combination with suitable expression vectors forrecombinant protein expression. Also provided herein are bi-directionalUCOE vectors that permit the simultaneous, high-level expression of twoor more recombinant proteins and/or polypeptides from a single UCOEbased plasmid vector.

SUMMARY OF THE INVENTION

[0008] The present invention is directed, generally, to compositions andmethods for the rapid and efficient development of recombinantcell-lines that are suitable for high-level, large-scale development andmanufacture of recombinant proteins and/or polypeptides.

[0009] In one aspect, the present invention provides compositions,comprising: (a) an immortalized host cell-line, capable of continuousgrowth in culture, which host cell-line is capable of growth inserum-free suspension culture, and (b) a vector for sustainedoverexpression of a recombinant protein and/or polypeptide, such as aUCOE-based vector described herein.

[0010] The present invention, in another aspect, provides methods forthe high-level, large-scale production of polypeptides. Particularmethods comprise the steps of (a) obtaining an immortalized hostcell-line capable of growth in suspension; (b) adapting the hostcell-line for growth in serum-free medium; (c) transfecting theresulting immortalized host cell-line capable of growth in suspensionand serum-free medium with a vector suitable for overexpression of arecombinant protein and/or polypeptide.

[0011] According to the compositions and methods of the presentinvention, suitable immortalized host cell-lines may possess one or moreof the following properties: (a) doubling times of no more than 16hours, preferably between 12 and 16 hours; (b) transfection efficiencyof at least 70%, preferably at least 75%, 80%, 85%, 90% or 95%; (c)susceptible to standard selection agents such as, for example,hygromycin, G418, and puromycin; (d) absence of gal-gal glycosylation ofrecombinant protein and/or polypeptide.

[0012] Exemplary immortalized host cell-lines that may be adapted foruse in the presently claimed invention include, but are not limited to,the following commercially available host cell-lines: (a) CHO-S (aChinese hamster ovary host cell-line); (b) 293-F (a human hostcell-line); (c) 293-H (a human host cell-line); (d) COS-7L (a monkeyhost cell-line); (e) D.Mel-2 (an insect host cell-line); (f) Sf21 (aninsect host cell-line); and (g) Sf9 (an insect host cell-line).Alternatively, suitable host cell-lines may be obtained through routineexperimentation following the methodologies disclosed herein.

[0013] Vectors for overexpression of recombinant proteins and/orpolypeptides suitable for use in the compositions and methods of thepresent invention may possess one or more of the following properties:(a) contains one or more elements that facilitate high-level,large-scale expression in the immortalized host cell-line and (b) areresistant to repression of the recombinant protein and/or polypeptide.

[0014] Within certain embodiments, vectors of the present invention mayfurther comprise one or more universal chromatin opening elements(UCOEs) as defined herein below. Additionally or alternatively, vectorsas disclosed herein may comprise one or more transcriptional promoterssuch as, for example, the CMV promoter.

[0015] Preferred compositions and methods of the present invention arecapable of achieving expression levels of at least 50 mg recombinantprotein and/or polypeptide per liter of culture, more preferably atleast 100 mg recombinant protein and/or polypeptide per liter, and stillmore preferably at least 200 mg recombinant protein and/or polypeptideper liter.

[0016] The present invention further provides compositions and methodsthat are capable of scale-up to at least 100 liter scale with yields(per 100 liter culture) of at least 1 gram of protein and/orpolypeptide, more preferably at least 5 grams of protein and/orpolypeptide, still more preferably at least 10 grams of protein and/orpolypeptide, and most preferably at least 20 grams of protein and/orpolypeptide.

[0017] The present invention still further provides compositions andmethods employing bi-directional vector systems for the high-levelexpression of two or more recombinant proteins on a single UCOE-basedplasmid vector. Exemplary bi-directional vector systems may comprise oneor more transcriptional promoter selected from the group consisting ofthe murine CMV promoter, the human CMV promoter, and the humanbeta-actin promoter.

[0018] The present invention also provides compositions and methods forimproved expression of one or more recombinant protein comprising an RNPUCOE-based plasmid vector, such as, e.g., CET720GFP, optionallycomprising one or more deletions within the 8 kb RNP UCOE portion.Illustrative UCOE deletion constructs will preferably retain significantUCOE activity, e.g., at least about 50%, preferably at least about 75%,and more preferably at least 90% or more of UCOE activity relative tothe activity of the 8 kb RNP UCOE element described herein. Exemplarydeletions may, optionally, comprise deletions within regions of the RNPUCOE selected from the group consisting of ΔBS, ΔEcoNI, ΔEM, ΔMluI, andΔRV, as depicted in Table 4 and FIG. 14. Deletions within the scope ofthe present invention are preferably at least 100 bp, more preferably atleast 250 bp, still more preferably at least 1000 bp, still morepreferably at least 2500 bp and still more preferably at least 4000 bp.Particularly illustrative UCOE vectors of the present invention willthus minimally comprise at least one or more UCOE portions, wherein theUCOE portions retain a desired level of UCOE activity. In oneillustrative embodiment, at least about a 4.1 kb UCOE portioncorresponding to nucleotide residues 5152-9254 of CET720GFP (SEQ ID NO:9) is employed. This UCOE portion, for example, has been demonstratedherein to retain a level of UCOE activity comparable to that observedthe full 8 kb UCOE element corresponding to nucleotide residues2225-10525 of CET720GFP (SEQ ID NO: 9). These and other UCOE portionscan be readily identified, and their activities evaluated, via routineand art-recognized techniques in view of the disclosure provided herein.

[0019] These and other aspects of the present invention will becomeapparent upon reference to the following detailed description andattached drawings. All references disclosed herein are herebyincorporated by reference in their entirety as if each was incorporatedindividually.

BRIEF DESCRIPTION OF THE DRAWINGS AND SEQUENCE IDENTIFIERS

[0020]FIG. 1 is a diagrammatic representation of UCOE-based antibodyexpression cassettes.

[0021]FIGS. 2A and 2B are plasmid maps of vectors that may be used forexpression of recombinant human antibodies. FIG. 2A shows a plasmid forexpression of recombinant human Ig heavy chain. FIG. 2B shows a plasmidfor expression of recombinant human Ig kappa light chain.

[0022]FIG. 3 is a graph depicting antibody expression levels in CHOcells transfected with and without UCOEs.

[0023]FIG. 4 shows the results of scale-up of a CHO-S cell linetransfected with vectors expressing the Heavy and Light chains ofantibody Ab1 in shake-flask culture and in a 2 liter bioreactor. Theleft-hand panel shows antibody titer determined by ELISA. The right-handpanel shows cell growth.

[0024]FIG. 5 is a graph depicting the levels of Gal-Gal residues on thesurface of murine hybridoma, CHO-K1, and CHO-S cells.

[0025]FIG. 6 is a diagrammatic representation of the bi-directional UCOEplasmid vector pBDUneo100.

[0026]FIG. 7 is a diagrammatic representation of the bi-directional UCOEplasmid vector pBDUneo200.

[0027]FIG. 8 is a diagrammatic representation of the bi-directional UCOEplasmid vector pBDUpuro300.

[0028]FIG. 9 is a diagrammatic representation of the bi-directional UCOEplasmid vector pBDUpuro400.

[0029]FIG. 10 is a diagrammatic representation of the bi-directionalUCOE plasmid vector pBDUneo500.

[0030]FIG. 11 is a diagrammatic representation of the bi-directionalUCOE plasmid vector pBDUneo600.

[0031]FIG. 12 is a diagrammatic representation of the bi-directionalUCOE plasmid vector pBDUpuro700.

[0032]FIG. 13 is a diagrammatic representation of the bi-directionalUCOE plasmid vector pBDUpuro800.

[0033]FIG. 14 is a diagrammatic representation of deletions within the 8kb RNP UCOE of CET720GFP.

[0034]FIG. 15 is a diagrammatic representation of the bi-directionalUCOE plasmid vector pBDUpuro350.

[0035]FIG. 16 is a diagrammatic representation of the bi-directionalUCOE plasmid vector pBDUpuro450.

[0036]FIG. 17 is a diagrammatic representation of the bi-directionalUCOE plasmid vector pBDUneo1200.

[0037]FIG. 18 is a diagrammatic representation of the bi-directionalUCOE plasmid vector pBDUpuro1450.

[0038]FIG. 19 is a diagrammatic representation of the bi-directionalUCOE plasmid vector pBDUneo1600.

[0039]FIG. 20 is a diagrammatic representation of the bi-directionalUCOE plasmid vector pBDUpuro1800.

[0040]FIG. 21 is a graph depicting the antibody production rates forillustrative cell lines containing bi-directional UCOE plasmid vectors.

BRIEF DESCRIPTION OF THE SEQUENCE IDENTIFIERS

[0041] SEQ ID NO:1 is the polynucleotide sequence of pBDUneo100.

[0042] SEQ ID NO:2 is the polynucleotide sequence of pBDUneo200.

[0043] SEQ ID NO:3 is the polynucleotide sequence of pBDUpuro300.

[0044] SEQ ID NO:4 is the polynucleotide sequence of pBDUpuro400.

[0045] SEQ ID NO: 5 is the polynucleotide sequence of pBDUneo500.

[0046] SEQ ID NO: 6 is the polynucleotide sequence of pBDUneo600

[0047] SEQ ID NO: 7 is the polynucleotide sequence of pBDUpuro700.

[0048] SEQ ID NO: 8 is the polynucleotide sequence of pBDUpuro800.

[0049] SEQ ID NO: 9 is the polynucleotide sequence of vector CET720GFP.

[0050] SEQ ID NOs: 10-26 represent illustrative primer sequencesemployed in Example 4 for the production of improved UCOE vectorsaccording to the invention.

[0051] SEQ ID NO: 27 is the polynucleotide sequence of pBDUpuro350.

[0052] SEQ ID NO: 28 is the polynucleotide sequence of pBDUpuro450.

[0053] SEQ ID NO: 29 is the polynucleotide sequence of pBDUneo1200.

[0054] SEQ ID NO: 30 is the polynucleotide sequence of pBDUpuro1450.

[0055] SEQ ID NO: 31 is the polynucleotide sequence of pBDUneo1600.

[0056] SEQ ID NO: 32 is the polynucleotide sequence of pBDUpuro1800.

DETAILED DESCRIPTION OF THE INVENTION

[0057] The present invention is directed generally to compositions andmethods for use in high-level, large-scale production of recombinantproteins and/or polypeptides. As described further below, illustrativecompositions of the present invention include, but are not restrictedto, immortalized, serum-free, suspension host cell-lines in combinationwith one or more expression vectors suitable for the high-level,large-scale expression of recombinant proteins and or polypeptides.

[0058] The practice of the present invention will employ, unlessindicated specifically to the contrary, conventional methods ofvirology, immunology, microbiology, molecular biology and recombinantDNA techniques within the skill of the art, many of which are describedbelow for the purpose of illustration. Such techniques are explainedfully in the literature. See, e.g., Sambrook, et al. Molecular Cloning:A Laboratory Manual (2nd Edition, 1989); Maniatis et al. MolecularCloning: A Laboratory Manual (1982); DNA Cloning: A Practical Approach,vol. I & II (D. Glover, ed.); Oligonucleotide Synthesis (N. Gait, ed.,1984); Nucleic Acid Hybridization (B. Hames & S. Higgins, eds., 1985);Transcription and Translation (B. Hames & S. Higgins, eds., 1984);Animal Cell Culture (R. Freshney, ed., 1986); Perbal, A Practical Guideto Molecular Cloning (1984).

[0059] All publications, patents and patent applications cited herein,whether supra or infra, are hereby incorporated by reference in theirentirety.

[0060] As used in this specification and the appended claims, thesingular forms “a,” “an” and “the” include plural references unless thecontent clearly dictates otherwise.

[0061] Preparation and Selection of Serum-free, Suspension HostCell-lines

[0062] Host cell-lines ideally suitable for use in the compositions andmethods of the present invention may have one or more of the followingattributes: (a) capable of immortal, continuous growth in culture; (b)adapted for growth in suspension; (c) rapid growth, preferably 12-16hour doubling time; (d) high transfection efficiency, preferably atleast 70%; (e) susceptibility to selection by standard selection agents,preferably hygromycin, G418 or puromycin; (f) protein glycosylationpatterns consistent with use as a human therapeutic, preferably theabsence of gal-gal glycosylation pattern; and (g) adapted for growth inserum-free medium, preferably chemically-defined, protein-free growthwithout indirect animal-derived components.

[0063] A host cell-line having one or more of these attributes may beused to develop a system for the rapid development of recombinant hostcell-lines that may be transferred into development and manufacturingwith reduced effort and time as compared to existing methodologies forthe high-level, large-scale production of recombinant proteins and/orpolypeptides.

[0064] For long-term, high-yield production of recombinant proteins,stable expression is generally preferred. For example, cell-lines thatstably express a polynucleotide of interest may be transfected usingexpression vectors which may contain endogenous expression elements anda selectable marker gene on the same or on a separate vector. Followingthe introduction of the vector, cells may be allowed to grow for 1-2days in an enriched media before they are switched to selective media.The purpose of the selectable marker is to confer resistance toselection, and its presence allows growth and recovery of cells thatsuccessfully express the introduced sequences. Resistant clones ofstably transformed cells may be proliferated using tissue culturetechniques appropriate to the cell type.

[0065] Any number of selection systems may be used to recovertransformed cell-lines. These include, but are not limited to, theherpes simplex virus thymidine kinase (Wigler, M. et al. (1977) Cell11:223-32) and adenine phosphoribosyltransferase (Lowy, I. et al. (1990)Cell 22:817-23) genes which can be employed in tk.sup.- oraprt.sup.-cells, respectively. Also, antimetabolite, antibiotic orherbicide resistance can be used as the basis for selection; forexample, dhfr which confers resistance to methotrexate (Wigler, M. etal. (1980) Proc. Natl. Acad. Sci. 77:3567-70); glutamine synthetase (GS)which confers glutainine—independent growth and resistance to methioninesulphoximine (Bebbington et al. (1992) Biotechnology 10(2):169-75; andCockett et al. (1991) Nucleic Acids Res. 25;19(2):319-25; npt, whichconfers resistance to the aminoglycosides, neomycin and G-418(Colbere-Garapin, F. et al (1981) J. Mol. Biol. 150:1-14); and als orpat, which confer resistance to chlorsulfuron and phosphinotricinacetyltransferase, respectively (Murry, supra). Additional selectablegenes have been described, for example, trpB, which allows cells toutilize indole in place of tryptophan, or hisD, which allows cells toutilize histinol in place of histidine (Hartman, S. C. and R. C.Mulligan (1988) Proc. Natl. Acad. Sci. 85:8047-51). The use of visiblemarkers has gained popularity with such markers as anthocyanins,beta-glucuronidase and its substrate GUS, and luciferase and itssubstrate luciferin, being widely used not only to identifytransformants, but also to quantify the amount of transient or stableprotein expression attributable to a specific vector system (Rhodes, C.A. et al. (1995) Methods Mol. Biol. 55:121-131).

[0066] Although the presence/absence of marker gene expression suggeststhat the gene of interest is also present, its presence and expressionmay need to be confirmed. For example, if the sequence encoding apolypeptide is inserted within a marker gene sequence, recombinant cellscontaining sequences can be identified by the absence of marker genefunction. Alternatively, a marker gene can be placed in tandem with apolypeptide-encoding sequence under the control of a single promoter.Expression of the marker gene in response to induction or selectionusually indicates expression of the tandem gene as well.

[0067] Alternatively, host cells that contain and express a desiredpolynucleotide sequence may be identified by a variety of proceduresknown to those of skill in the art. These procedures include, but arenot limited to, DNA-DNA or DNA-RNA hybridizations and protein bioassayor immunoassay techniques which include, for example, membrane,solution, or chip based technologies for the detection and/orquantification of nucleic acid or protein.

[0068] A variety of protocols for detecting and measuring the expressionof polynucleotide-encoded products, using either polyclonal ormonoclonal antibodies specific for the product are known in the art.Examples include enzyme-linked immunosorbent assay (ELISA),radioimmunoassay (RIA), and fluorescence activated cell sorting (FACS).A two-site, monoclonal-based immunoassay utilizing monoclonal antibodiesreactive to two non-interfering epitopes on a given polypeptide may bepreferred for some applications, but a competitive binding assay mayalso be employed. These and other assays are described, among otherplaces, in Hampton, R. et al. (1990; Serological Methods, a LaboratoryManual, APS Press, St Paul. Minn.) and Maddox, D. E. et al. (1983; J.Exp. Med. 158:1211-1216).

[0069] A wide variety of labels and conjugation techniques are known bythose skilled in the art and may be used in various nucleic acid andamino acid assays. Means for producing labeled hybridization or PCRprobes for detecting sequences related to polynucleotides includeoligolabeling, nick translation, end-labeling or PCR amplification usinga labeled nucleotide. Alternatively, the sequences, or any portionsthereof may be cloned into a vector for the production of an mRNA probe.Such vectors are known in the art, are commercially available, and maybe used to synthesize RNA probes in vitro by addition of an appropriateRNA polymerase such as T7, T3, or SP6 and labeled nucleotides. Theseprocedures may be conducted using a variety of commercially availablekits. Suitable reporter molecules or labels, which may be used includeradionuclides, enzymes, fluorescent, chemiluminescent, or chromogenicagents as well as substrates, cofactors, inhibitors, magnetic particles,and the like.

[0070] Host cells transformed with a polynucleotide sequence of interestmay be cultured under conditions suitable for the expression andrecovery of the protein from cell culture. The protein produced by arecombinant cell may be secreted or contained intracellularly dependingon the sequence and/or the vector used. As will be understood by thoseof skill in the art, expression vectors containing polynucleotides ofthe invention may be designed to contain signal sequences which directsecretion of the encoded polypeptide through a prokaryotic or eukaryoticcell membrane. Other recombinant constructions may be used to joinsequences encoding a polypeptide of interest to nucleotide sequenceencoding a polypeptide domain which will facilitate purification ofsoluble proteins. Such purification facilitating domains include, butare not limited to, metal chelating peptides such ashistidine-tryptophan modules that allow purification on immobilizedmetals, protein A domains that allow purification on immobilizedimmunoglobulin, and the domain utilized in the FLAGS extension/affinitypurification system (Immunex Corp., Seattle, Wash.). The inclusion ofcleavable linker sequences such as those specific for Factor XA orenterokinase (Invitrogen) between the purification domain and theencoded polypeptide may be used to facilitate purification. One suchexpression vector provides for expression of a fusion protein containinga polypeptide of interest and a nucleic acid encoding 6 histidineresidues preceding a thioredoxin or an enterokinase cleavage site. Thehistidine residues facilitate purification on IMIAC (immobilized metalion affinity chromatography) as described in Porath, J. et al. (1992,Prot. Exp. Purif. 3:263-281) while the enterokinase cleavage siteprovides a means for purifying the desired polypeptide from the fusionprotein. A discussion of vectors which contain fusion proteins isprovided in Kroll, D. J. et al. (1993; DNA Cell Biol. 12:441-453).

[0071] Serum-free, immortal host cell-lines are readily available from avariety of public and/or commercial sources such as, for example, theAmerican Type Culture Collection (ATCC; Manassas, Va.); Celox (St. Paul,Minn.); Invitrogen (Carlsbad, Calif.); the European and Japanese CellBanks (ECACC, Salisbury, Wiltshire (UK) and JCRB, Shinjuky, Japan,respectively).

[0072] Suitable host cell-lines may be obtained by selecting an existinghost cell-line that possesses one or more of the above attributes andadapt and/or select for variants of that host cell-line to obtained theremaining attributes. The use of pre-adapted host cell-lines ensuresthat the cells are capable of achieving the desired conditions prior tobeginning the process of transfection and recombinant proteinexpression. As noted below, such cell-lines are ideally suited for usein conjunction with UCOE containing expression vectors because thesevector systems are characterized by stable, long-term, high-levelprotein expression.

[0073] Exemplary suitable host cell-lines that may be modified and/oradapted for use according to the compositions and methods of the presentinvention include, but are not limited to, the following: (a) 293-F, ahuman host cell-line; (b) 293-H, a human host cell-line; (c) COS-7L, amonkey host cell-line; (d) D.MEL-2, an insect host cell-line; (e) SF21,an insect host cell-line; (f) SF9, an insect host cell-line; and (g)CHO-S, a Chinese hamster ovary host cell-line.

[0074] For example, a Chinese hamster ovary subcloned (CHO-S;Invitrogen/Gibco) that has been adapted to a commercially availablechemically defined, protein free media may be suitably employed in thecompositions and methods of the present invention. See, D'Anna et al.,Radiation Research 148:260-271 (1997); D'Anna et al., Methods in CellScience 18:115-125 (19960; Deaven et al., Chromosoma 41:129-144 (1973);Gorfein et al., Animal Cell Technology: Basic & Applied Aspects9:247-252 (Kluwer Academic Publishers, Netherlands, 1998). The CHO-Shost cell-line has a 12 to 16 hour doubling time in shaker flaskcultures reaching a peak cell density of 9-11×10⁶ viable cells/ml. Theyare susceptible to hygromycin at 400 ug/ml and geneticin (G418) at 600ug/ml. The cells grow as attachment independent single cells even in astationary culture.

[0075] The presence of the Galα1→3Galβ1→4GlcNAc-R (Gal-Gal) carbohydrateresidue on recombinant proteins used clinically has been associated withrapid protein clearance from the serum. Rodent cells typically introducethe terminal Gal-Gal disaccharide into the carbohydrate structures ofsecreted glycoproteins although the Gal-Gal residue is not found inhuman glycoproteins. As a result, the ability to produce recombinantprotein without this particular carbohydrate structure is advantageous.

[0076] The CHO-S host cell-line is particularly well suited for use inconjunction with expression vectors comprising one or more UCOEelements, as noted herein below. This host cell-line possesses favorablegrowth characteristics and generates undetectable levels of the Gal-Galcarbohydrate moiety in its surface glycoproteins. Thus, the CHO-S hostcell-line is suitable for expression of recombinant proteins and/orpolypeptides produced for clinical use.

[0077] Preparation and Selection of Expression Vectors

[0078] Suitable vector systems for expression of recombinant proteinsand/or polypeptides according to the present invention may include oneor more of the following attributes: (a) ease of manipulation; (b)elements that make high-level expression site-of-integrationindependent; (c) elements that make expression resistant tosilencing/repression thereby allowing for sustained, stable expressionover long periods of time; and (d) elements that express at high-levelsin different cell types and in different species.

[0079] In order to express a desired protein and/or polypeptide, thenucleotide sequences encoding the polypeptide, or functionalequivalents, may be inserted into appropriate expression vector, i.e., avector which contains the necessary elements for the transcription andtranslation of the inserted coding sequence. Methods which are wellknown to those skilled in the art may be used to construct expressionvectors containing sequences encoding a polypeptide of interest andappropriate transcriptional and translational control elements. Thesemethods include in vitro recombinant DNA techniques, synthetictechniques, and in vivo genetic recombination. Such techniques aredescribed, for example, in Sambrook, J. et al. (1989) Molecular Cloning,A Laboratory Manual, Cold Spring Harbor Press, Plainview, N.Y., andAusubel, F. M. et al. (1989) Current Protocols in Molecular Biology,John Wiley & Sons, New York. N.Y.

[0080] A variety of expression vector/host systems may be utilized tocontain and express polynucleotide sequences. These include, but are notlimited to plasmid or cosmid DNA expression vectors; insect cell systemsinfected with virus expression vectors (e.g., baculovirus); plant cellsystems transformed with virus expression vectors (e.g., cauliflowermosaic virus, CaMV; tobacco mosaic virus, TMV); or animal cell systems.

[0081] The “control elements” or “regulatory sequences” present in anexpression vector are those non-translated regions of thevector—enhancers, promoters, 5′ and 3′ untranslated regions—whichinteract with host cellular proteins to carry out transcription andtranslation. Such elements may vary in their strength and specificity.Depending on the vector system and host utilized, any number of suitabletranscription and translation elements, including constitutive andinducible promoters, may be used. In mammalian cell systems, promotersfrom mammalian genes or from mammalian viruses are generally preferred.If it is necessary to generate a cell-line that contains multiple copiesof the sequence encoding a polypeptide, vectors containing GS or DHFRselectable markers or vectors based on SV40 or EBV may be advantageouslyused with an appropriate selectable marker.

[0082] An insect system may also be used to express a polypeptide ofinterest. For example, in one such system, Autographa californicanuclear polyhedrosis virus (AcNPV) is used as a vector to expressforeign genes in Spodoptera frugiperda cells or in Trichoplusia larvae.The sequences encoding the polypeptide may be cloned into anon-essential region of the virus, such as the polyhedrin gene, andplaced under control of the polyhedrin promoter. Successful insertion ofthe polypeptide-encoding sequence will render the polyhedrin geneinactive and produce recombinant virus lacking coat protein. Therecombinant viruses may then be used to infect, for example, S.frugiperda cells or Trichoplusia larvae in which the polypeptide ofinterest may be expressed (Engelhard, E. K. et al. (1994) Proc. Natl.Acad. Sci. 91 :3224-3227).

[0083] In mammalian host cells, a number of viral-based expressionsystems are generally available. For example, in cases where anadenovirus is used as an expression vector, sequences encoding apolypeptide of interest may be ligated into an adenovirustranscription/translation complex consisting of the late promoter andtripartite leader sequence. Insertion in a non-essential E1 or E3 regionof the viral genome may be used to obtain a viable virus which iscapable of expressing the polypeptide in infected host cells (Logan, J.and Shenk, T. (1984) Proc. Natl. Acad. Sci. 81:3655-3659). In addition,transcription enhancers, such as the Rous sarcoma virus (RSV) enhancer,may be used to increase expression in mammalian host cells.

[0084] Specific initiation signals may also be used to achieve moreefficient translation of sequences encoding a polypeptide of interest.Such signals include the ATG initiation codon and adjacent sequences. Incases where sequences encoding the polypeptide, its initiation codon,and upstream sequences are inserted into the appropriate expressionvector, no additional transcriptional or translational control signalsmay be needed. However, in cases where only coding sequence, or aportion thereof, is inserted, exogenous translational control signalsincluding the ATG initiation codon should be provided. Furthermore, theinitiation codon should be in the correct reading frame to ensuretranslation of the entire insert. Exogenous translational elements andinitiation codons may be of various origins, both natural and synthetic.The efficiency of expression may be enhanced by the inclusion ofenhancers which are appropriate for the particular cell system which isused, such as those described in the literature (Scharf, D. et al.(1994) Results Probl. Cell Differ. 20:125-162).

[0085] Exemplary preferred elements suitable for making high-levelexpression site-of-integration independent include, for example,universal chromatin opening elements (UCOEs). UCOEs are polynucleotidesequences that maintain chromatin in an “open” configuration. See, e.g.,Crombie et al., PCT Patent Application No. WO0005393 (2000). Inclusionof a UCOE in an expression vector upsteam of the promoter provideshigh-levels of expression that are independent of integration site andare resistant to silencing. Efficient expression can be derived from asingle copy of an integrated gene site resulting in a higher percentageof cells expressing the marker gene in the selected pool in comparisonto standard non-UCOE containing vectors. This, in combination with theutilization of a serum free, suspension adapted parent cell-line allowsfor rapid production of large quantities of protein in a short period oftime. The increased efficiency obtained with the UCOE vectorsignificantly reduces the number of transfectants which need to bescreened in order to obtain a high productivity subclone.

[0086] Utilization of vectors containing one or more UCOEs in asuspension-adapted host cell-line allows for rapid development andscale-up for production protein and/or polypeptide such as, for example,antibody or fragment thereof. UCOEs allow for screening of a smallnumber of subclones to obtain a clone capable of producing at least 50mg/L of protein and/or polypeptide, more preferably at least 100 mg/L ofprotein and/or polypeptide, and still more preferably at least 200 mg/Lof protein and/or polypeptide in a 5 week period in serum freeconditions.

[0087] Preferably, expression vector systems suitable for use in thecompositions and methods of the present invention are capable ofyielding expression levels in excess of 1 g protein and/or polypeptideper liter of suspension culture. More preferably, expression vectors arecapable of use in stable host cell-lines wherein least 20 pg proteinand/or polypeptide per cell are achieved per day.

[0088] As discussed in detail herein below, within certain embodimentsof the present invention, the protein and/or polypeptide may compriseone or more subunits such as, for example, antibody heavy and lightchains or fragments thereof. As is well understood in the art, efficientfunctional antibody production requires appropriately balancedexpression of the heavy and light chains. Transfection of the two chainson separate plasmids makes maintenance of an equal copy number difficultand provides the potential for transcriptional interference between thegenes if the vectors integrate close to one another in the genome.Consequently, bi-directional vectors for the co-expression of two geneson the same vector may be employed. As disclosed in further detail inthe Examples herein below, exemplary bi-directional UCOE-based vectorsystems, within the scope of the present invention, may, optionally, beconstructed based on the “hybrid” RNP/beta-actin UCOE (CobraTherapeutics). Vectors may comprise one or more antibiotic resistancemarkers such as, e.g., the neomycin or puromycin resistance markers,and/or may comprise one or more mammalian promoter such as, e.g., themurine CMV promoter (mCMV), the human CMV promoter (hCMV), or the humanactin promoters to drive light or heavy chain expression.

[0089] Transfection of Host Cell-lines with Expression Vectors of thePresent Invention

[0090] Transfection of a standard host cell-line, preadapted to grow ina large scale setting, allows for more rapid cell-line developmentthereby increasing the transition rate from research into developmentand manufacturing. In contrast, the traditional approach of using aparent cell-line which requires serum free and suspension adaptationafter transfection further increases the need for screening a largenumber of subclones, because many of the subclones will not be able togrow under conditions that allow large scale protein production. Use ofa preadapted cell-line can reduce the time required to develop acell-line from months to weeks. The cell-line is preadapted to achemically defined, protein free media and grows rapidly to high celldensities in a shaker flask or bioreactor.

[0091] Suitable transfection protocols are readily known and/oravailable to those of skill in the art. Exemplary transfection protocolsthat are suitable for achieving high-level, large-scale transfection arethose recommended by Invitrogen/Gibco for transfection of the CHO-S hostcell-line. Generally, positive selection of transfected cells may beachieved using agents such as, for example, hygromycin, G418, andpuromycin. Transfection efficiencies are typically at least 70%, morepreferably at least 75%, 80%, 85%, 90% or 95%. Following transfectionand selection, the pool of resulting clones may, optionally, be furthersubcloned to identify individual clones with the highest levels ofprotein expression.

[0092] Selection of Cell Culture Conditions

[0093] Selection and testing of serum-free media suitable for culture ofthe immortalized suspension cells according to the present invention maybe achieved by the skilled artisan by routine experimentation. For CHO-Scells, described herein above, the CD-CHO media is suitable. (e.g,available from Invitrogen or Gibco).

[0094] Exemplary Proteins and/or Polypeptides Suitable for High-level,Large-scale Expression

[0095] As used herein, the terms “protein” and “polypeptide” are used intheir conventional meaning, i.e., as a sequence of amino acids. Thepolypeptides are not limited to a specific length of the product; thus,peptides, oligopeptides, and proteins are included within the definitionof polypeptide, and such terms may be used interchangeably herein unlessspecifically indicated otherwise. This term also does not refer to orexclude post-expression modifications of the polypeptide, for example,glycosylations, acetylations, phosphorylations and the like, as well asother modifications known in the art, both naturally occurring andnon-naturally occurring. As noted above, however, preferred proteinsand/or polypeptides according to the present invention lack Gal-Galglycosylation. A polypeptide may be an entire protein, or a subsequencethereof. Particular polypeptides of interest in the context of thisinvention are amino acid subsequences comprising epitopes, i.e.,antigenic determinants substantially responsible for the immunogenicproperties of a polypeptide and being capable of evoking an immuneresponse.

[0096] In certain preferred embodiments, the polypeptides producedand/or employed according to the present invention are immunogenic,i.e., they react detectably within an immunoassay (such as an ELISA orT-cell stimulation assay) with antisera and/or T-cells from a patientwith a cancer. Screening for immunogenic activity can be performed usingtechniques well known to the skilled artisan. For example, such screenscan be performed using methods such as those described in Harlow andLane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory,1988. In one illustrative example, a polypeptide may be immobilized on asolid support and contacted with patient sera to allow binding ofantibodies within the sera to the immobilized polypeptide. Unbound seramay then be removed and bound antibodies detected using, for example,¹²⁵I-labeled Protein A.

[0097] As would be recognized by the skilled artisan, immunogenicportions of the polypeptides produced according to the disclosureprovided herein are also encompassed by the present invention. An“immunogenic portion,” as used herein, is a fragment of an immunogenicpolypeptide of the invention that itself is immunologically reactive(i.e., specifically binds) with the B-cells and/or T-cell surfaceantigen receptors that recognize the polypeptide. Immunogenic portionsmay generally be identified using well known techniques, such as thosesummarized in Paul, Fundamental Immunology, 3rd ed., 243-247 (RavenPress, 1993) and references cited therein. Such techniques includescreening polypeptides for the ability to react with antigen-specificantibodies, antisera and/or T-cell-lines or clones. As used herein,antisera and antibodies are “antigen-specific” if they specifically bindto an antigen (i.e., they react with the protein in an ELISA or otherimmunoassay, and do not react detectably with unrelated proteins). Suchantisera and antibodies may be prepared as described herein, and usingwell-known techniques.

[0098] In one preferred embodiment, an immunogenic portion of apolypeptide of the present invention is a portion that reacts withantisera and/or T-cells at a level that is not substantially less thanthe reactivity of the full-length polypeptide (e.g., in an ELISA and/orT-cell reactivity assay). Preferably, the level of immunogenic activityof the immunogenic portion is at least about 50%, preferably at leastabout 70% and most preferably greater than about 90% of theimmunogenicity for the full-length polypeptide. In some instances,preferred immunogenic portions will be identified that have a level ofimmunogenic activity greater than that of the corresponding full-lengthpolypeptide, e.g., having greater than about 100% or 150% or moreimmunogenic activity.

[0099] In certain other embodiments, illustrative immunogenic portionsmay include peptides in which an N-terminal leader sequence and/ortransmembrane domain have been deleted. Other illustrative immunogenicportions will contain a small N- and/or C-terminal deletion (e.g., 1-30amino acids, preferably 5-15 amino acids), relative to the matureprotein.

[0100] In another embodiment, a protein and/or polypeptide made and/orused according to the present invention may also comprise one or morepolypeptides that are immunologically reactive with T cells and/orantibodies generated against a polypeptide of the invention,particularly a polypeptide having an amino acid sequence disclosedherein, or to an immunogenic fragment or variant thereof.

[0101] A polypeptide “variant,” as the term is used herein, is apolypeptide that typically differs from a polypeptide specificallydisclosed herein in one or more substitutions, deletions, additionsand/or insertions. Such variants may be naturally occurring or may besynthetically generated, for example, by modifying one or more of theabove polypeptide sequences of the invention and evaluating theiractivity as described herein and/or using any of a number of techniqueswell known in the art. Illustrative variant sequences according to thepresent invention are those sequences related by homology to the 8 kbRNP UCOE sequence provided herein, or a subsequence thereof, whichretain a desired degree of UCOE activity.

[0102] In one embodiment, for example, particularly illustrative variantsequences of the invention comprise polynucleotide sequences having atleast 70%, 75%, 80%, 85%, 90%, 95% or 99% or more identity with a UCOEpolynucleotide specifically disclosed herein. Preferably such variantsexhibit at least 70%, 75%, 80%, 85%, 90%, 95% or 100% or more UCOEactivity when compared with the UCOE activity exhibited by the 8 kb RNPUCOE element disclosed herein.

[0103] In many instances, a variant will contain conservativesubstitutions. A “conservative substitution” is one in which an aminoacid is substituted for another amino acid that has similar properties,such that one skilled in the art of peptide chemistry would expect thesecondary structure and hydropathic nature of the polypeptide to besubstantially unchanged. As described above, modifications may be madein the structure of the polynucleotides and polypeptides of the presentinvention and still obtain a functional molecule that encodes a variantor derivative polypeptide with desirable characteristics, e.g., withimmunogenic characteristics. When it is desired to alter the amino acidsequence of a polypeptide to create an equivalent, or even an improved,variant or portion of a polypeptide of the invention, one skilled in theart will typically change one or more of the codons of the encoding DNAsequence according to Table 1.

[0104] For example, certain amino acids may be substituted for otheramino acids in a protein structure without appreciable loss ofinteractive binding capacity with structures such as, for example,antigen-binding regions of antibodies or binding sites on substratemolecules. Since it is the interactive capacity and nature of a proteinthat defines that protein's biological functional activity, certainamino acid sequence substitutions can be made in a protein sequence,and, of course, its underlying DNA coding sequence, and neverthelessobtain a protein with like properties. It is thus contemplated thatvarious changes may be made in the peptide sequences of the disclosedcompositions, or corresponding DNA sequences which encode said peptideswithout appreciable loss of their biological utility or activity. TABLE1 Amino Acids Codons Alanine Ala A GCA GCC GCG GCU Cysteine Cys C UGCUGU Aspartic acid Asp D GAC GAU Glutamic acid Glu E GAA GAGPhenylalanine Phe F UUC UUU Glycine Gly G GGA GGC GGG GGU Histidine HisH CAC CAU Isoleucine Ile I AUA AUC AUU Lysine Lys K AAA AAG Leucine LeuL UUA UUG CUA CUC CUG CUU Methionine Met M AUG Asparagine Asn N AAC AAUProline Pro P CCA CCC CCG CCU Glutamine Gln Q CAA CAG Arginine Arg R AGAAGG CGA CGC CGG CGU Serine Ser S AGC AGU UCA UCC UCG UCU Threonine Thr TACA ACC ACG ACU Valine Val V GUA GUC GUG GUU Tryptophan Trp W UGGTyrosine Tyr Y UAC UAU

[0105] In making such changes, the hydropathic index of amino acids maybe considered. The importance of the hydropathic amino acid index inconferring interactive biologic function on a protein is generallyunderstood in the art (Kyte and Doolittle, 1982, incorporated herein byreference). It is accepted that the relative hydropathic character ofthe amino acid contributes to the secondary structure of the resultantprotein, which in turn defines the interaction of the protein with othermolecules, for example, enzymes, substrates, receptors, DNA, antibodies,antigens, and the like. Each amino acid has been assigned a hydropathicindex on the basis of its hydrophobicity and charge characteristics(Kyte and Doolittle, 1982). These values are: isoleucine (+4.5); valine(+4.2); leucine (+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5);methionine (+1.9); alanine (+1.8); glycine (−0.4); threonine (−0.7);serine (−0.8); tryptophan (−0.9); tyrosine (−1.3); proline (−1.6);histidine (−3.2); glutamate (−3.5); glutamine (−3.5); aspartate (−3.5);asparagine (−3.5); lysine (−3.9); and arginine (−4.5).

[0106] It is known in the art that certain amino acids may besubstituted by other amino acids having a similar hydropathic index orscore and still result in a protein with similar biological activity,i.e. still obtain a biological functionally equivalent protein. Inmaking such changes, the substitution of amino acids whose hydropathicindices are within ±2 is preferred, those within ±1 are particularlypreferred, and those within ±0.5 are even more particularly preferred.It is also understood in the art that the substitution of like aminoacids can be made effectively on the basis of hydrophilicity. U.S. Pat.No. 4,554,101 (specifically incorporated herein by reference in itsentirety), states that the greatest local average hydrophilicity of aprotein, as governed by the hydrophilicity of its adjacent amino acids,correlates with a biological property of the protein.

[0107] As detailed in U.S. Pat. No. 4,554,101, the followinghydrophilicity values have been assigned to amino acid residues:arginine (+3.0); lysine (+3.0); aspartate (+3.0±1); glutamate (+3.0±1);serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0);threonine (−0.4); proline (−0.5±1); alanine (−0.5); histidine (−0.5);cysteine (−1.0); methionine (−1.3); valine (−1.5); leucine (−1.8);isoleucine (−1.8); tyrosine (−2.3); phenylalanine (−2.5); tryptophan(−3.4). It is understood that an amino acid can be substituted foranother having a similar hydrophilicity value and still obtain abiologically equivalent, and in particular, an immunologicallyequivalent protein. In such changes, the substitution of amino acidswhose hydrophilicity values are within ±2 is preferred, those within ±1are particularly preferred, and those within ±0.5 are even moreparticularly preferred.

[0108] As outlined above, amino acid substitutions are generallytherefore based on the relative similarity of the amino acid side-chainsubstituents, for example, their hydrophobicity, hydrophilicity, charge,size, and the like. Exemplary substitutions that take various of theforegoing characteristics into consideration are well known to those ofskill in the art and include: arginine and lysine; glutamate andaspartate; serine and threonine; glutamine and asparagine; and valine,leucine and isoleucine.

[0109] In addition, any polynucleotide may be further modified toincrease stability in vivo. Possible modifications include, but are notlimited to, the addition of flanking sequences at the 5′ and/or 3′ ends;the use of phosphorothioate or 2′ O-methyl rather than phosphodiesteraselinkages in the backbone; and/or the inclusion of nontraditional basessuch as inosine, queosine and wybutosine, as well as acetyl-methyl-,thio- and other modified forms of adenine, cytidine, guanine, thymineand uridine.

[0110] Amino acid substitutions may further be made on the basis ofsimilarity in polarity, charge, solubility, hydrophobicity,hydrophilicity and/or the amphipathic nature of the residues. Forexample, negatively charged amino acids include aspartic acid andglutamic acid; positively charged amino acids include lysine andarginine; and amino acids with uncharged polar head groups havingsimilar hydrophilicity values include leucine, isoleucine and valine;glycine and alanine; asparagine and glutamine; and serine, threonine,phenylalanine and tyrosine. Other groups of amino acids that mayrepresent conservative changes include: (1) ala, pro, gly, glu, asp,gln, asn, ser, thr; (2) cys, ser, tyr, thr; (3) val, ile, leu, met, ala,phe; (4) lys, arg, his; and (5) phe, tyr, trp, his. A variant may also,or alternatively, contain nonconservative changes. In a preferredembodiment, variant polypeptides differ from a native sequence bysubstitution, deletion or addition of five amino acids or fewer.Variants may also (or alternatively) be modified by, for example, thedeletion or addition of amino acids that have minimal influence on theimmunogenicity, secondary structure and hydropathic nature of thepolypeptide.

[0111] As noted above, polypeptides may comprise a signal (or leader)sequence at the N-terminal end of the protein, which co-translationallyor post-translationally directs transfer of the protein. The polypeptidemay also be conjugated to a linker or other sequence for ease ofsynthesis, purification or identification of the polypeptide (e.g.,poly-His), or to enhance binding of the polypeptide to a solid support.For example, a polypeptide may be conjugated to an immunoglobulin Fcregion.

[0112] When comparing polypeptide sequences, two sequences are said tobe “identical” if the sequence of amino acids in the two sequences isthe same when aligned for maximum correspondence, as described below.Comparisons between two sequences are typically performed by comparingthe sequences over a comparison window to identify and compare localregions of sequence similarity. A “comparison window” as used herein,refers to a segment of at least about 20 contiguous positions, usually30 to about 75, 40 to about 50, in which a sequence may be compared to areference sequence of the same number of contiguous positions after thetwo sequences are optimally aligned.

[0113] Optimal alignment of sequences for comparison may be conductedusing the Megalign program in the Lasergene suite of bioinformaticssoftware (DNASTAR, Inc., Madison, Wis.), using default parameters. Thisprogram embodies several alignment schemes described in the followingreferences: Dayhoff, M. O. (1978) A model of evolutionary change inproteins—Matrices for detecting distant relationships. In Dayhoff, M. O.(ed.) Atlas of Protein Sequence and Structure, National BiomedicalResearch Foundation, Washington DC Vol. 5, Suppl. 3, pp. 345-358; HeinJ. (1990) Unified Approach to Alignment and Phylogenes pp. 626-645Methods in Enzymology vol. 183, Academic Press, Inc., San Diego, Calif.;Higgins, D. G. and Sharp, P. M. (1989) CABIOS 5:151-153; Myers, E. W.and Muller W. (1988) CABIOS 4:11-17; Robinson, E. D. (1971) Comb. Theor11:105; Saitou, N. Nei, M. (1987) Mol. Biol. Evol. 4:406-425; Sneath, P.H. A. and Sokal, R. R. (1973) Numerical Taxonomy—the Principles andPractice of Numerical Taxonomy, Freeman Press, San Francisco, Calif.;Wilbur, W. J. and Lipman, D. J. (1983) Proc. Natl. Acad., Sci. USA80:726-730.

[0114] Alternatively, optimal alignment of sequences for comparison maybe conducted by the local identity algorithm of Smith and Waterman(1981) Add. APL. Math 2:482, by the identity alignment algorithm ofNeedleman and Wunsch (1970) J. Mol. Biol. 48:443, by the search forsimilarity methods of Pearson and Lipman (1988) Proc. Natl. Acad. Sci.USA 85: 2444, by computerized implementations of these algorithms (GAP,BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics SoftwarePackage, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wis.),or by inspection.

[0115] One preferred example of algorithms that are suitable fordetermining percent sequence identity and sequence similarity are theBLAST and BLAST 2.0 algorithms, which are described in Altschul et al.(1977) Nucl. Acids Res. 25:3389-3402 and Altschul et al. (1990) J. Mol.Biol. 215:403-410, respectively. BLAST and BLAST 2.0 can be used, forexample with the parameters described herein, to determine percentsequence identity for the polynucleotides and polypeptides of theinvention. Software for performing BLAST analyses is publicly availablethrough the National Center for Biotechnology Information. For aminoacid sequences, a scoring matrix can be used to calculate the cumulativescore. Extension of the word hits in each direction are halted when: thecumulative alignment score falls off by the quantity X from its maximumachieved value; the cumulative score goes to zero or below, due to theaccumulation of one or more negative-scoring residue alignments; or theend of either sequence is reached. The BLAST algorithm parameters W, Tand X determine the sensitivity and speed of the alignment.

[0116] In one preferred approach, the “percentage of sequence identity”is determined by comparing two optimally aligned sequences over a windowof comparison of at least 20 positions, wherein the portion of thepolypeptide sequence in the comparison window may comprise additions ordeletions (i.e., gaps) of 20 percent or less, usually 5 to 15 percent,or 10 to 12 percent, as compared to the reference sequences (which doesnot comprise additions or deletions) for optimal alignment of the twosequences. The percentage is calculated by determining the number ofpositions at which the identical amino acid residue occurs in bothsequences to yield the number of matched positions, dividing the numberof matched positions by the total number of positions in the referencesequence (i.e., the window size) and multiplying the results by 100 toyield the percentage of sequence identity.

[0117] Within other illustrative embodiments, a polypeptide producedand/or employed according to the present invention may be a xenogeneicpolypeptide that comprises a polypeptide having substantial sequenceidentity, as described above, to the human polypeptide (also termedautologous antigen) which served as a reference polypeptide, but whichxenogeneic polypeptide is derived from a different, non-human species.One skilled in the art will recognize that “self” antigens are oftenpoor stimulators of CD8+ and CD4+ T-lymphocyte responses, and thereforeefficient immunotherapeutic strategies directed against tumorpolypeptides require the development of methods to overcome immunetolerance to particular self tumor polypeptides. For example, humansimmunized with prostase protein from a xenogeneic (non human) origin arecapable of mounting an immune response against the counterpart humanprotein, e.g. the human prostase tumor protein present on human tumorcells. Therefore, one aspect of the present invention providesxenogeneic variants of the protein and/or polypeptides described herein.

[0118] More particularly, the invention is directed to mouse, rat,monkey, porcine and other non-human polypeptides which can be used asxenogeneic forms of human polypeptides set forth herein.

[0119] Within other illustrative embodiments, the present invention mayemploy and/or produce a fusion polypeptide that comprises multiplepolypeptides and/or polypeptide subunits, as described herein, or thatcomprises at least one polypeptide as described herein and an unrelatedsequence. A fusion partner may, for example, assist in providing Thelper epitopes (an immunological fusion partner), preferably T helperepitopes recognized by humans, or may assist in expressing the protein(an expression enhancer) at higher yields than the native recombinantprotein. Certain preferred fusion partners are both immunological andexpression enhancing fusion partners. Other fusion partners may beselected so as to increase the solubility of the polypeptide or toenable the polypeptide to be targeted to desired intracellularcompartments. Still further fusion partners include affinity tags, whichfacilitate purification of the polypeptide.

[0120] Fusion polypeptides may generally be prepared using standardtechniques, including chemical conjugation. Preferably, a fusionpolypeptide is expressed as a recombinant polypeptide employingcompositions and methods of the present invention, and allowing theproduction of increased levels in an expression system. Briefly, forexample, DNA sequences encoding the polypeptide components may beassembled separately, and ligated into an appropriate expression vector.The 3′ end of the DNA sequence encoding one polypeptide component isligated, with or without a peptide linker, to the 5′ end of a DNAsequence encoding the second polypeptide component so that the readingframes of the sequences are in phase. This permits translation into asingle fusion polypeptide that retains the biological activity of bothcomponent polypeptides.

[0121] A peptide linker sequence may be employed to separate the firstand second polypeptide components by a distance sufficient to ensurethat each polypeptide folds into its secondary and tertiary structures.Such a peptide linker sequence is incorporated into the fusionpolypeptide using standard techniques well known in the art. Suitablepeptide linker sequences may be chosen based on the following factors:(1) their ability to adopt a flexible extended conformation; (2) theirinability to adopt a secondary structure that could interact withfunctional epitopes on the first and second polypeptides; and (3) thelack of hydrophobic or charged residues that might react with thepolypeptide functional epitopes. Preferred peptide linker sequencescontain Gly, Asn and Ser residues. Other near neutral amino acids, suchas Thr and Ala may also be used in the linker sequence. Amino acidsequences which may be usefully employed as linkers include thosedisclosed in Maratea et al., Gene 40:39-46, 1985; Murphy et al., Proc.Natl. Acad. Sci. USA 83:8258-8262, 1986; U.S. Pat. No. 4,935,233 andU.S. Pat. No. 4,751,180. The linker sequence may generally be from 1 toabout 50 amino acids in length. Linker sequences are not required whenthe first and second polypeptides have non-essential N-terminal aminoacid regions that can be used to separate the functional domains andprevent steric interference.

[0122] The ligated DNA sequences are operably linked to suitabletranscriptional or translational regulatory elements. The regulatoryelements responsible for expression of DNA are located only 5′ to theDNA sequence encoding the first polypeptides. Similarly, stop codonsrequired to end translation and transcription termination signals areonly present 3′ to the DNA sequence encoding the second polypeptide.

[0123] The fusion polypeptide can comprise a polypeptide made and/ordescribed herein together with an unrelated protein, such as animmunogenic protein capable of eliciting a recall response. Examples ofsuch proteins include tetanus, tuberculosis and hepatitis proteins (see,for example, Stoute et al. New Engl. J. Med., 336:86-91, 1997).

[0124] In one preferred embodiment, the immunological fusion partner isderived from a Mycobacterium sp., such as a Mycobacteriumtuberculosis-derived Ra12 fragment. Ra12 compositions and methods fortheir use in enhancing the expression and/or immunogenicity ofheterologous polynucleotide/polypeptide sequences is described in U.S.patent application Ser. No. 60/158,585, the disclosure of which isincorporated herein by reference in its entirety. Briefly, Ra12 refersto a polynucleotide region that is a subsequence of a Mycobacteriumtuberculosis MTB32A nucleic acid. MTB32A is a serine protease of 32 KDmolecular weight encoded by a gene in virulent and avirulent strains ofM. tuberculosis. The nucleotide sequence and amino acid sequence ofMTB32A have been described (for example, U.S. patent application Ser.No. 60/158,585; see also, Skeiky et al., Infection and Immun. (1999)67:3998-4007, incorporated herein by reference). C-terminal fragments ofthe MTB32A coding sequence express at high levels and remain as asoluble polypeptides throughout the purification process. Moreover, Ra12may enhance the immunogenicity of heterologous immunogenic polypeptideswith which it is fused. One preferred Ra12 fusion polypeptide comprisesa 14 KD C-terminal fragment corresponding to amino acid residues 192 to323 of MTB32A. Other preferred Ra12 polynucleotides generally compriseat least about 15 consecutive nucleotides, at least about 30nucleotides, at least about 60 nucleotides, at least about 100nucleotides, at least about 200 nucleotides, or at least about 300nucleotides that encode a portion of a Ra12 polypeptide. Ra12polynucleotides may comprise a native sequence (i.e., an endogenoussequence that encodes a Ra12 polypeptide or a portion thereof) or maycomprise a variant of such a sequence. Ra12 polynucleotide variants maycontain one or more substitutions, additions, deletions and/orinsertions such that the biological activity of the encoded fusionpolypeptide is not substantially diminished, relative to a fusionpolypeptide comprising a native Ra12 polypeptide. Variants preferablyexhibit at least about 70% identity, more preferably at least about 80%identity and most preferably at least about 90% identity to apolynucleotide sequence that encodes a native Ra12 polypeptide or aportion thereof.

[0125] Within other preferred embodiments, an immunological fusionpartner is derived from protein D, a surface protein of thegram-negative bacterium Haemophilus influenza B (WO 91/18926).Preferably, a protein D derivative comprises approximately the firstthird of the protein (e.g., the first N-terminal 100-110 amino acids),and a protein D derivative may be lipidated. Within certain preferredembodiments, the first 109 residues of a Lipoprotein D fusion partner isincluded on the N-terminus to provide the polypeptide with additionalexogenous T-cell epitopes and to increase the expression level in E.coli (thus functioning as an expression enhancer). The lipid tailensures optimal presentation of the antigen to antigen presenting cells.Other fusion partners include the non-structural protein from influenzaevirus, NS1 (hemaglutinin). Typically, the N-terminal 81 amino acids areused, although different fragments that include T-helper epitopes may beused.

[0126] In another embodiment, the immunological fusion partner is theprotein known as LYTA, or a portion thereof (preferably a C-terminalportion). LYTA is derived from Streptococcus pneumoniae, whichsynthesizes an N-acetyl-L-alanine amidase known as amidase LYTA (encodedby the LytA gene; Gene 43:265-292, 1986). LYTA is an autolysin thatspecifically degrades certain bonds in the peptidoglycan backbone. TheC-terminal domain of the LYTA protein is responsible for the affinity tothe choline or to some choline analogues such as DEAE. This property hasbeen exploited for the development of E. coli C-LYTA expressing plasmidsuseful for expression of fusion proteins. Purification of hybridproteins containing the C-LYTA fragment at the amino terminus has beendescribed (see Biotechnology 10:795-798, 1992). Within a preferredembodiment, a repeat portion of LYTA may be incorporated into a fusionpolypeptide. A repeat portion is found in the C-terminal region startingat residue 178. A particularly preferred repeat portion incorporatesresidues 188-305.

[0127] Yet another illustrative embodiment involves fusion polypeptides,and the polynucleotides encoding them, wherein the fusion partnercomprises a targeting signal capable of directing a polypeptide to theendosomal/lysosomal compartment, as described in U.S. Pat. No.5,633,234. An immunogenic polypeptide of the invention, when fused withthis targeting signal, will associate more efficiently with MHC class IImolecules and thereby provide enhanced in vivo stimulation of CD4⁺T-cells specific for the polypeptide.

[0128] In general, protein and/or polypeptides (including fusionpolypeptides) of the invention are isolated. An “isolated” polypeptideis one that is removed from its original environment. For example, anaturally-occurring protein or polypeptide is isolated if it isseparated from some or all of the coexisting materials in the naturalsystem. Preferably, such polypeptides are also purified, e.g., are atleast about 90% pure, more preferably at least about 95% pure and mostpreferably at least about 99% pure.

[0129] Particularly preferred polypeptides produced by the methods ofthe present invention include binding agents, such as antibodies andantigen-binding fragments thereof, that exhibit immunological binding toa target polypeptide of interest, such as a polypeptide associated witha particular disease state, or to a portion, variant or derivativethereof. An antibody, or antigen-binding fragment thereof, is said to“specifically bind,” “immunogically bind,” and/or is “immunologicallyreactive” to a polypeptide of the invention if it reacts at a detectablelevel (within, for example, an ELISA assay) with the polypeptide, anddoes not react detectably with unrelated polypeptides under similarconditions.

[0130] Immunological binding, as used in this context, generally refersto the non-covalent interactions of the type which occur between animmunoglobulin molecule and an antigen for which the immunoglobulin isspecific. The strength, or affinity of immunological bindinginteractions can be expressed in terms of the dissociation constant(K_(d)) of the interaction, wherein a smaller K_(d) represents a greateraffinity. Immunological binding properties of selected polypeptides canbe quantified using methods well known in the art. One such methodentails measuring the rates of antigen-binding site/antigen complexformation and dissociation, wherein those rates depend on theconcentrations of the complex partners, the affinity of the interaction,and on geometric parameters that equally influence the rate in bothdirections. Thus, both the “on rate constant” (K_(on)) and the “off rateconstant” (K_(off)) can be determined by calculation of theconcentrations and the actual rates of association and dissociation. Theratio of K_(off)/K_(on) enables cancellation of all parameters notrelated to affinity, and is thus equal to the dissociation constantK_(d). See, generally, Davies et al. (1990) Annual Rev. Biochem.59:439-473.

[0131] An “antigen-binding site,” or “binding portion” of an antibodyrefers to the part of the immunoglobulin molecule that participates inantigen binding. The antigen binding site is formed by amino acidresidues of the N-terminal variable (“V”) regions of the heavy (“H”) andlight (“L”) chains. Three highly divergent stretches within the Vregions of the heavy and light chains are referred to as “hypervariableregions” which are interposed between more conserved flanking stretchesknown as “framework regions,” or “FRs”. Thus the term “FR” refers toamino acid sequences which are naturally found between and adjacent tohypervariable regions in immunoglobulins. In an antibody molecule, thethree hypervariable regions of a light chain and the three hypervariableregions of a heavy chain are disposed relative to each other in threedimensional space to form an antigen-binding surface. Theantigen-binding surface is complementary to the three-dimensionalsurface of a bound antigen, and the three hypervariable regions of eachof the heavy and light chains are referred to as“complementarity-determining regions,” or “CDRs.”

[0132] Certain binding agents, such as those specific for atumor-associated protein, will be further capable of differentiatingbetween patients with and without a cancer using the representativeassays provided herein and known in the art. For example, antibodies orother binding agents that bind to a tumor protein will preferablygenerate a signal indicating the presence of a cancer in at least about20% of patients with the disease, more preferably at least about 30% ofpatients. Alternatively, or in addition, the antibody will generate anegative signal indicating the absence of the disease in at least about90% of individuals without the cancer. To determine whether a bindingagent satisfies this requirement, biological samples (e.g., blood, sera,sputum, urine and/or tumor biopsies) from patients with and without acancer (as determined using standard clinical tests) may be assayed asdescribed herein for the presence of polypeptides that bind to thebinding agent. Preferably, a statistically significant number of sampleswith and without the disease will be assayed. Each binding agent shouldsatisfy the above criteria; however, those of ordinary skill in the artwill recognize that binding agents may be used in combination to improvesensitivity. Other binding agents produced according to the presentinvention will also have therapeutic value based on their specificityfor tumor-associated polypeptide sequences.

[0133] Any agent that satisfies the above requirements may be a bindingagent. For example, a binding agent may be a ribosome, with or without apeptide component, an RNA molecule or a polypeptide. In a preferredembodiment, a binding agent is an antibody or an antigen-bindingfragment thereof. Antibodies may be prepared by any of a variety oftechniques known to those of ordinary skill in the art. See, e.g.,Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring HarborLaboratory, 1988. In addition to the methods exemplified hereinaccording to the present invention, numerous antibody productiontechniques are available to the skilled artisan. For example, antibodiescan also be produced by cell culture techniques, including thegeneration of monoclonal antibodies as described herein, or viatransfection of antibody genes into suitable bacterial or mammalian cellhosts, in order to allow for the production of recombinant antibodies.In one technique, an immunogen comprising the polypeptide is initiallyinjected into any of a wide variety of mammals (e.g., mice, rats,rabbits, sheep or goats). In this step, the polypeptides of thisinvention may serve as the immunogen without modification.Alternatively, particularly for relatively short polypeptides, asuperior immune response may be elicited if the polypeptide is joined toa carrier protein, such as bovine serum albumin or keyhole limpethemocyanin. The immunogen is injected into the animal host, preferablyaccording to a predetermined schedule incorporating one or more boosterimmunizations, and the animals are bled periodically. Polyclonalantibodies specific for the polypeptide may then be purified from suchantisera by, for example, affinity chromatography using the polypeptidecoupled to a suitable solid support.

[0134] Monoclonal antibodies specific for an antigenic polypeptide ofinterest may be prepared, for example, using the technique of Kohler andMilstein, Eur. J. Immunol. 6:511-519, 1976, and improvements thereto.Briefly, these methods involve the preparation of immortal cell-linescapable of producing antibodies having the desired specificity (i.e.,reactivity with the polypeptide of interest). Such cell-lines may beproduced, for example, from spleen cells obtained from an animalimmunized as described above. The spleen cells are then immortalized by,for example, fusion with a myeloma cell fusion partner, preferably onethat is syngeneic with the immunized animal. A variety of fusiontechniques may be employed. For example, the spleen cells and myelomacells may be combined with a nonionic detergent for a few minutes andthen plated at low density on a selective medium that supports thegrowth of hybrid cells, but not myeloma cells. A preferred selectiontechnique uses HAT (hypoxanthine, aminopterin, thymidine) selection.After a sufficient time, usually about 1 to 2 weeks, colonies of hybridsare observed. Single colonies are selected and their culturesupernatants tested for binding activity against the polypeptide.Hybridomas having high reactivity and specificity are preferred.

[0135] Monoclonal antibodies may be isolated from the supernatants ofgrowing hybridoma colonies. In addition, various techniques may beemployed to enhance the yield, such as injection of the hybridomacell-line into the peritoneal cavity of a suitable vertebrate host, suchas a mouse. Monoclonal antibodies may then be harvested from the ascitesfluid or the blood. Contaminants may be removed from the antibodies byconventional techniques, such as chromatography, gel filtration,precipitation, and extraction. The polypeptides of this invention may beused in the purification process in, for example, an affinitychromatography step.

[0136] A number of therapeutically useful molecules are known in the artwhich comprise antigen-binding sites that are capable of exhibitingimmunological binding properties of an antibody molecule. Theproteolytic enzyme papain preferentially cleaves IgG molecules to yieldseveral fragments, two of which (the “F(ab)” fragments) each comprise acovalent heterodimer that includes an intact antigen-binding site. Theenzyme pepsin is able to cleave IgG molecules to provide severalfragments, including the “F(ab′)₂” fragment which comprises bothantigen-binding sites. An “Fv” fragment can be produced by preferentialproteolytic cleavage of an IgM, and on rare occasions IgG or IgAimmunoglobulin molecule. Fv fragments are, however, more commonlyderived using recombinant techniques known in the art. The Fv fragmentincludes a non-covalent V_(H)::V_(L) heterodimer including anantigen-binding site which retains much of the antigen recognition andbinding capabilities of the native antibody molecule. Inbar et al.(1972) Proc. Nat. Acad. Sci. USA 69:2659-2662; Hochman et al. (1976)Biochem 15:2706-2710; and Ehrlich et al. (1980) Biochem 19:4091-4096.

[0137] A single chain Fv (“sFv”) polypeptide is a covalently linkedV_(H)::V_(L) heterodimer which is expressed from a gene fusion includingV_(H)- and V_(L)-encoding genes linked by a peptide-encoding linker.Huston et al. (1988) Proc. Nat. Acad. Sci. USA 85(16):5879-5883. Anumber of methods have been described to discern chemical structures forconverting the naturally aggregated—but chemically separated—light andheavy polypeptide chains from an antibody V region into an sFv moleculewhich will fold into a three dimensional structure substantially similarto the structure of an antigen-binding site. See, e.g., U.S. Pat. Nos.5,091,513 and 5,132,405, to Huston et al.; and U.S. Pat. No. 4,946,778,to Ladner et al.

[0138] Each of the above-described molecules includes a heavy chain anda light chain CDR set, respectively interposed between a heavy chain anda light chain FR set which provide support to the CDRS and define thespatial relationship of the CDRs relative to each other. As used herein,the term “CDR set” refers to the three hypervariable regions of a heavyor light chain V region. Proceeding from the N-terminus of a heavy orlight chain, these regions are denoted as “CDR1,” “CDR2,” and “CDR3”respectively. An antigen-binding site, therefore, includes six CDRs,comprising the CDR set from each of a heavy and a light chain V region.A polypeptide comprising a single CDR, (e.g., a CDR1, CDR2 or CDR3) isreferred to herein as a “molecular recognition unit.” Crystallographicanalysis of a number of antigen-antibody complexes has demonstrated thatthe amino acid residues of CDRs form extensive contact with boundantigen, wherein the most extensive antigen contact is with the heavychain CDR3. Thus, the molecular recognition units are primarilyresponsible for the specificity of an antigen-binding site.

[0139] As used herein, the term “FR set” refers to the four flankingamino acid sequences which frame the CDRs of a CDR set of a heavy orlight chain V region. Some FR residues may contact bound antigen;however, FRs are primarily responsible for folding the V region into theantigen-binding site, particularly the FR residues directly adjacent tothe CDRS. Within FRs, certain amino residues and certain structuralfeatures are very highly conserved. In this regard, all V regionsequences contain an internal disulfide loop of around 90 amino acidresidues. When the V regions fold into a binding-site, the CDRs aredisplayed as projecting loop motifs which form an antigen-bindingsurface. It is generally recognized that there are conserved structuralregions of FRs which influence the folded shape of the CDR loops intocertain “canonical” structures—regardless of the precise CDR amino acidsequence. Further, certain FR residues are known to participate innon-covalent interdomain contacts which stabilize the interaction of theantibody heavy and light chains.

[0140] A number of “humanized” antibody molecules comprising anantigen-binding site derived from a non-human immunoglobulin have beendescribed, including chimeric antibodies having rodent V regions andtheir associated CDRs fused to human constant domains (Winter et al.(1991) Nature 349:293-299; Lobuglio et al. (1989) Proc. Nat. Acad. Sci.USA 86:4220-4224; Shaw et al. (1987) J Immunol. 138:4534-4538; and Brownet al. (1987) Cancer Res. 47:3577-3583), rodent CDRs grafted into ahuman supporting FR prior to fusion with an appropriate human antibodyconstant domain (Riechmann et al. (1988) Nature 332:323-327; Verhoeyenet al. (1988) Science 239:1534-1536; and Jones et al. (1986) Nature321:522-525), and rodent CDRs supported by recombinantly veneered rodentFRs (European Patent Publication No. 519,596, published Dec. 23, 1992).These “humanized” molecules are designed to minimize unwantedimmunological response toward rodent antihuman antibody molecules whichlimits the duration and effectiveness of therapeutic applications ofthose moieties in human recipients.

[0141] As used herein, the terms “veneered FRs” and “recombinantlyveneered FRs” refer to the selective replacement of FR residues from,e.g., a rodent heavy or light chain V region, with human FR residues inorder to provide a xenogeneic molecule comprising an antigen-bindingsite which retains substantially all of the native FR polypeptidefolding structure. Veneering techniques are based on the understandingthat the ligand binding characteristics of an antigen-binding site aredetermined primarily by the structure and relative disposition of theheavy and light chain CDR sets within the antigen-binding surface.Davies et al. (1990) Ann. Rev. Biochem. 59:439-473. Thus, antigenbinding specificity can be preserved in a humanized antibody onlywherein the CDR structures, their interaction with each other, and theirinteraction with the rest of the V region domains are carefullymaintained. By using veneering techniques, exterior (e.g.,solvent-accessible) FR residues which are readily encountered by theimmune system are selectively replaced with human residues to provide ahybrid molecule that comprises either a weakly immunogenic, orsubstantially non-immunogenic veneered surface.

[0142] The process of veneering makes use of the available sequence datafor human antibody variable domains compiled by Kabat et al., inSequences of Proteins of Immunological Interest, 4th ed., (U.S. Dept. ofHealth and Human Services, U.S. Government Printing Office, 1987),updates to the Kabat database, and other accessible U.S. and foreigndatabases (both nucleic acid and protein). Solvent accessibilities of Vregion amino acids can be deduced from the known three-dimensionalstructure for human and murine antibody fragments. There are two generalsteps in veneering a murine antigen-binding site. Initially, the FRs ofthe variable domains of an antibody molecule of interest are comparedwith corresponding FR sequences of human variable domains obtained fromthe above-identified sources. The most homologous human V regions arethen compared residue by residue to corresponding murine amino acids.The residues in the murine FR which differ from the human counterpartare replaced by the residues present in the human moiety usingrecombinant techniques well known in the art. Residue switching is onlycarried out with moieties which are at least partially exposed (solventaccessible), and care is exercised in the replacement of amino acidresidues which may have a significant effect on the tertiary structureof V region domains, such as proline, glycine and charged amino acids.

[0143] In this manner, the resultant “veneered” murine antigen-bindingsites are thus designed to retain the murine CDR residues, the residuessubstantially adjacent to the CDRs, the residues identified as buried ormostly buried (solvent inaccessible), the residues believed toparticipate in non-covalent (e.g., electrostatic and hydrophobic)contacts between heavy and light chain domains, and the residues fromconserved structural regions of the FRs which are believed to influencethe “canonical” tertiary structures of the CDR loops. These designcriteria are then used to prepare recombinant nucleotide sequences whichcombine the CDRs of both the heavy and light chain of a murineantigen-binding site into human-appearing FRs that can be used totransfect mammalian cells for the expression of recombinant humanantibodies which exhibit the antigen specificity of the murine antibodymolecule.

[0144] In another embodiment of the invention, antibodies producedaccording to the present invention may be coupled to one or moretherapeutic agents. Suitable agents in this regard includeradionuclides, differentiation inducers, drugs, toxins, and derivativesthereof. Preferred radionuclides include ⁹⁰Y, ¹²³I, ¹²⁵I, ¹³¹I, ¹⁸⁶Re,¹⁸⁸Re, ²¹¹At, and ²¹²Bi. Preferred drugs include methotrexate, andpyrimidine and purine analogs. Preferred differentiation inducersinclude phorbol esters and butyric acid. Preferred toxins include ricin,abrin, diptheria toxin, cholera toxin, gelonin, Pseudomonas exotoxin,Shigella toxin, and pokeweed antiviral protein.

[0145] A therapeutic agent may be coupled (e.g., covalently bonded) to asuitable monoclonal antibody either directly or indirectly (e.g., via alinker group). A direct reaction between an agent and an antibody ispossible when each possesses a substituent capable of reacting with theother. For example, a nucleophilic group, such as an amino or sulfhydrylgroup, on one may be capable of reacting with a carbonyl-containinggroup, such as an anhydride or an acid halide, or with an alkyl groupcontaining a good leaving group (e.g., a halide) on the other.

[0146] Alternatively, it may be desirable to couple a therapeutic agentand an antibody via a linker group. A linker group can function as aspacer to distance an antibody from an agent in order to avoidinterference with binding capabilities. A linker group can also serve toincrease the chemical reactivity of a substituent on an agent or anantibody, and thus increase the coupling efficiency. An increase inchemical reactivity may also facilitate the use of agents, or functionalgroups on agents, which otherwise would not be possible.

[0147] It will be evident to those skilled in the art that a variety ofbifunctional or polyfunctional reagents, both homo- andhetero-functional (such as those described in the catalog of the PierceChemical Co., Rockford, Ill.), may be employed as the linker group.Coupling may be effected, for example, through amino groups, carboxylgroups, sulfhydryl groups or oxidized carbohydrate residues. There arenumerous references describing such methodology, e.g., U.S. Pat. No.4,671,958, to Rodwell et al.

[0148] Where a therapeutic agent is more potent when free from theantibody portion of the immunoconjugates of the present invention, itmay be desirable to use a linker group that is cleavable during or uponinternalization into a cell. A number of different cleavable linkergroups have been described. The mechanisms for the intracellular releaseof an agent from these linker groups include cleavage by reduction of adisulfide bond (e.g., U.S. Pat. No. 4,489,710, to Spitler), byirradiation of a photolabile bond (e.g., U.S. Pat. No. 4,625,014, toSenter et al.), by hydrolysis of derivatized amino acid side chains(e.g., U.S. Pat. No. 4,638,045, to Kohn et al.), by serumcomplement-mediated hydrolysis (e.g., U.S. Pat. No. 4,671,958, toRodwell et al.), and acid-catalyzed hydrolysis (e.g., U.S. Pat. No.4,569,789, to Blattler et al.).

[0149] Polynucleotides Suitable for Expressing Proteins and/orPolypeptides

[0150] The present invention, in other aspects, provides polynucleotidesthat encode the recombinant proteins and/or polypeptides disclosedherein above. The terms “DNA” and “polynucleotide” are used essentiallyinterchangeably herein to refer to a DNA molecule that has been isolatedfree of total genomic DNA of a particular species. “Isolated,” as usedherein, means that a polynucleotide is substantially away from othercoding sequences, and that the DNA molecule does not contain largeportions of unrelated coding DNA, such as large chromosomal fragments orother functional genes or polypeptide coding regions. Of course, thisrefers to the DNA molecule as originally isolated, and does not excludegenes or coding regions later added to the segment by the hand of man.

[0151] Polynucleotides may comprise a native sequence (i.e. anendogenous sequence that encodes a protein and/or polypeptide, forexample an antibody, or portion thereof) or may comprise a sequence thatencodes a variant or derivative, preferably and immunogenic variant orderivative, of such a sequence. In certain embodiments, thepolynucleotide sequences may encode immunogenic polypeptides, asdescribed above.

[0152] Typically, polynucleotide variants will contain one or moresubstitutions, additions, deletions and/or insertions, preferably suchthat the immunogenicity of the polypeptide encoded by the variantpolynucleotide is not substantially diminished relative to a polypeptideencoded by a polynucleotide sequence specifically set forth herein). Theterm “variants” should also be understood to encompass homologous genesof xenogeneic origin.

[0153] The polynucleotides of the present invention, or fragmentsthereof, regardless of the length of the coding sequence itself, may becombined with other DNA sequences, such as promoters, polyadenylationsignals, additional restriction enzyme sites, multiple cloning sites,other coding segments, and the like, such that their overall length mayvary considerably. It is therefore contemplated that a nucleic acidfragment of almost any length may be employed, with the total lengthpreferably being limited by the ease of preparation and use in theintended recombinant DNA protocol. For example, illustrativepolynucleotide segments with total lengths of about 10,000, about 5000,about 3000, about 2,000, about 1,000, about 500, about 200, about 100,about 50 base pairs in length, and the like, (including all intermediatelengths) are contemplated to be useful in many implementations of thisinvention.

[0154] Polynucleotides suitable for high-level, large-scale expressionaccording to the present invention may be identified, prepared and/ormanipulated using any of a variety of well established techniques (seegenerally, Sambrook et al., Molecular Cloning: A Laboratory Manual, ColdSpring Harbor Laboratories, Cold Spring Harbor, N.Y., 1989, and otherlike references). For example, a polynucleotide may be identified byscreening a microarray of cDNAs for tumor-associated expression. Suchscreens may be performed, for example, using the microarray technologyof Affymetrix, Inc. (Santa Clara, Calif.) according to themanufacturer's instructions (and essentially as described by Schena etal., Proc. Natl. Acad. Sci. USA 93:10614-10619, 1996 and Heller et al.,Proc. Natl. Acad. Sci. USA 94:2150-2155, 1997). Alternatively,polynucleotides may be amplified from cDNA prepared from cellsexpressing the proteins described herein, such as tumor cells.

[0155] Many template dependent processes are available to amplify atarget sequences of interest present in a sample. One of the best knownamplification methods is the polymerase chain reaction (PCR™) which isdescribed in detail in U.S. Pat. Nos. 4,683,195, 4,683,202 and4,800,159, each of which is incorporated herein by reference in itsentirety. Briefly, in PCR™, two primer sequences are prepared which arecomplementary to regions on opposite complementary strands of the targetsequence. An excess of deoxynucleoside triphosphates is added to areaction mixture along with a DNA polymerase (e.g., Taq polymerase). Ifthe target sequence is present in a sample, the primers will bind to thetarget and the polymerase will cause the primers to be extended alongthe target sequence by adding on nucleotides. By raising and loweringthe temperature of the reaction mixture, the extended primers willdissociate from the target to form reaction products, excess primerswill bind to the target and to the reaction product and the process isrepeated. Preferably reverse transcription and PCR™ amplificationprocedure may be performed in order to quantify the amount of mRNAamplified. Polymerase chain reaction methodologies are well known in theart.

[0156] Any of a number of other template dependent processes, many ofwhich are variations of the PCR™ amplification technique, are readilyknown and available in the art. Illustratively, some such methodsinclude the ligase chain reaction (referred to as LCR), described, forexample, in Eur. Pat. Appl. Publ. No. 320,308 and U.S. Pat. No.4,883,750; Qbeta Replicase, described in PCT Intl. Pat. Appl. Publ. No.PCT/US87/00880; Strand Displacement Amplification (SDA) and Repair ChainReaction (RCR). Still other amplification methods are described in GreatBritain Pat. Appl. No. 2 202 328, and in PCT Intl. Pat. Appl. Publ. No.PCT/US89/01025. Other nucleic acid amplification procedures includetranscription-based amplification systems (TAS) (PCT Intl. Pat. Appl.Publ. No. WO 88/10315), including nucleic acid sequence basedamplification (NASBA) and 3SR. Eur. Pat. Appl. Publ. No. 329,822describes a nucleic acid amplification process involving cyclicallysynthesizing single-stranded RNA (“ssRNA”), ssDNA, and double-strandedDNA (dsDNA). PCT Intl. Pat. Appl. Publ. No. WO 89/06700 describes anucleic acid sequence amplification scheme based on the hybridization ofa promoter/primer sequence to a target single-stranded DNA (“ssDNA”)followed by transcription of many RNA copies of the sequence. Otheramplification methods such as “RACE” (Frohman, 1990), and “one-sidedPCR” (Ohara, 1989) are also well-known to those of skill in the art.

[0157] An amplified portion of a polynucleotide of the present inventionmay be used to isolate a full length gene from a suitable library (e.g.,a tumor cDNA library) using well known techniques. Within suchtechniques, a library (cDNA or genomic) is screened using one or morepolynucleotide probes or primers suitable for amplification. Preferably,a library is size-selected to include larger molecules. Random primedlibraries may also be preferred for identifying 5′ and upstream regionsof genes. Genomic libraries are preferred for obtaining introns andextending 5′ sequences. Alternatively, or in addition, essentially anyamplified polynucleotide may be employed in routine subcloningtechniques in order to arrive at a UCOE-based vector according to thisinvention.

[0158] For hybridization techniques, a partial sequence may be labeled(e.g., by nick-translation or end-labeling with ³²P) using well knowntechniques. A bacterial or bacteriophage library is then generallyscreened by hybridizing filters containing denatured bacterial colonies(or lawns containing phage plaques) with the labeled probe (see Sambrooket al., Molecular Cloning: A Laboratory Manual, Cold Spring HarborLaboratories, Cold Spring Harbor, N.Y., 1989). Hybridizing colonies orplaques are selected and expanded, and the DNA is isolated for furtheranalysis. cDNA clones may be analyzed to determine the amount ofadditional sequence by, for example, PCR using a primer from the partialsequence and a primer from the vector. Restriction maps and partialsequences may be generated to identify one or more overlapping clones.The complete sequence may then be determined using standard techniques,which may involve generating a series of deletion clones. The resultingoverlapping sequences can then assembled into a single contiguoussequence. A full length cDNA molecule can be generated by ligatingsuitable fragments, using well known techniques.

[0159] Alternatively, amplification techniques, such as those describedabove, can be useful for obtaining a full length coding sequence from apartial cDNA sequence. One such amplification technique is inverse PCR(see Triglia et al., Nucl. Acids Res. 16:8186, 1988), which usesrestriction enzymes to generate a fragment in the known region of thegene. The fragment is then circularized by intramolecular ligation andused as a template for PCR with divergent primers derived from the knownregion. Within an alternative approach, sequences adjacent to a partialsequence may be retrieved by amplification with a primer to a linkersequence and a primer specific to a known region. The amplifiedsequences are typically subjected to a second round of amplificationwith the same linker primer and a second primer specific to the knownregion. A variation on this procedure, which employs two primers thatinitiate extension in opposite directions from the known sequence, isdescribed in WO 96/38591. Another such technique is known as “rapidamplification of cDNA ends” or RACE. This technique involves the use ofan internal primer and an external primer, which hybridizes to a polyAregion or vector sequence, to identify sequences that are 5′ and 3′ of aknown sequence. Additional techniques include capture PCR (Lagerstrom etal., PCR Methods Applic. 1:111-19, 1991) and walking PCR (Parker et al.,Nucl. Acids. Res. 19:3055-60, 1991). Other methods employingamplification may also be employed to obtain a full length cDNAsequence.

[0160] In certain instances, it is possible to obtain a full length cDNAsequence by analysis of sequences provided in an expressed sequence tag(EST) database, such as that available from GenBank. Searches foroverlapping ESTs may generally be performed using well known programs(e.g., NCBI BLAST searches), and such ESTs may be used to generate acontiguous full length sequence. Full length DNA sequences may also beobtained by analysis of genomic fragments.

[0161] In certain preferred embodiments of the invention, polynucleotidesequences or fragments thereof are employed in the construction and/oruse of UCOE-based vectors and encode one or more polypeptides ofinterest, such as antibodies or fusion proteins or functionalequivalents thereof. Due to the inherent degeneracy of the genetic code,other DNA sequences that encode substantially the same or a functionallyequivalent amino acid sequence may be produced and these sequences maybe used to clone and express a given polypeptide.

[0162] As will be understood by those of skill in the art, it may beadvantageous in some instances to produce polypeptide-encodingnucleotide sequences possessing non-naturally occurring codons. Forexample, codons preferred by a particular prokaryotic or eukaryotic hostcan be selected to increase the rate of protein expression or to producea recombinant RNA transcript having desirable properties, such as ahalf-life which is longer than that of a transcript generated from thenaturally occurring sequence.

[0163] Moreover, the polynucleotide sequences of the present inventioncan be engineered using methods generally known in the art in order toalter polypeptide encoding sequences for a variety of reasons, includingbut not limited to, alterations which modify the cloning, processing,and/or expression of the gene product. For example, DNA shuffling byrandom fragmentation and PCR reassembly of gene fragments and syntheticoligonucleotides may be used to engineer the nucleotide sequences. Inaddition, site-directed mutagenesis may be used to insert newrestriction sites, alter glycosylation patterns, change codonpreference, produce splice variants, or introduce mutations, and soforth.

[0164] A newly synthesized peptide may be substantially purified, forexample, by preparative high performance liquid chromatography (e.g.,Creighton, T. (1983) Proteins, Structures and Molecular Principles, WHFreeman and Co., New York, N.Y.) or other comparable techniquesavailable in the art. The composition of the synthetic peptides may beconfirmed by amino acid analysis or sequencing (e.g., the Edmandegradation procedure). Additionally, the amino acid sequence of apolypeptide, or any part thereof, may be altered during direct synthesisand/or combined using chemical methods with sequences from otherproteins, or any part thereof, to produce a variant polypeptide.

[0165] The following Examples are offered by way of illustration notlimitation.

EXAMPLES Example 1 Expression of Recombinant Antibody in a UCOE-BasedExpression Vector System

[0166] This example discloses a comparison between the expression levelsof recombinant antibodies using vectors with and without UCOEs.

[0167] Engineered human antibody Ab3 was expressed from vectorscontaining a human RNP UCOE as shown in FIG. 1. Identical vectors, butwithout the UCOE element, were also constructed. The Ig heavy chaincoding sequence in this example comprises an engineered human V-regionsequence introduced upstream of and in frame with a genomic DNA fragmentencoding a human Ig gamma-1 constant region. The Ig light chain codingsequence comprises an engineered human V-region sequence introducedupstream of and in frame with a cDNA fragment encoding a human Ig kappaconstant region. The vector for expression of the Ig heavy chainadditionally contains a neo selectable marker gene and the vector forexpression of the Ig light chain contains a hygromycin selectablemarker. See FIG. 2A.

[0168] CHO-K1 cells were co-transfected with the light-chain andheavy-chain vectors using lipofectamine (Life Technologies) according tothe manufacturers' instructions. Cells were selected using hygromycinand G418. Pools of transfectants were maintained and levels of assembledimmunoglobulin secreted into culture medium were determined by ELISA atvarious times post-transfection. (FIG. 3). In the absence of the RNPUCOE, antibody expression levels were low (approximately 48 ng/ml) 48hours after transfection and declined thereafter. In contrast, intransfection pools from expression vectors containing the RNP UCOE,antibody levels continued to accumulate as the transfected cultures wereexpanded, reaching 3 micrograms/ml 15 days post-transfection. Thus, useof UCOEs permited rapid generation of pools of transfected cells thatexpress high levels of recombinant immunoglobulin.

Example 2 High-level, Large-scale Expression Achieved in CHO HostCell-line Transfected with UCOE-Based Expression Vector System

[0169] CHO-S cells were co-transfected with vectors containing UCOEantibody expression cassettes (shown in FIG. 1) to produce theengineered human antibody Ab1. The Ig heavy chain coding sequencecomprises an engineered human V-region sequence introduced upstream ofand in frame with a genomic DNA fragment encoding a human Ig gamma-4constant region. The Ig light chain coding sequence comprises anengineered human V-region sequence introduced upstream of and in framewith a cDNA fragment encoding a human Ig kappa constant region. Thevector for expression of the Ig Heavy chain additionally contains a neoselectable marker gene and the vector for expression of the Ig lightchain contains a hygromycin selectable marker. See FIG. 2B.

[0170] Transfections were carried out using lipofectamine (LifeTechnologies) according to the manufacturers' instructions. Cells wereselected using hygromycin and G418 in CD-CHO medium (Life Technologies)and subclones were selected. This process took approximately 5 weeks.One subclone was scaled into a 2L bioreactor to perform final parameteroptimization before being scaled into a 100L bioreactor. Productionrates from the majority of transfectants expressing recombinantantibodies were typically approximately 5 pg/cell/day using thisapproach. Yields of one antibody in suspension culture reachedapproximately 200 mg/l. See FIG. 4. The inclusion of the UCOE in the twoexpression vectors co-transfected into CHO-S cells resulted in rapidisolation of a transfectant clone that could immediately be cultured insuspension in a defined medium.

Example 3 Low Levels of Gal-Gal Residues on CHO-K1 and CHO-S HostCell-lines

[0171] As discussed hereinabove, the presence of theGalα1→3Galβ1→4GlcNAc-R (Gal-Gal) carbohydrate residue on antibodies usedas human therapeutics has been associated with rapid protein clearancefrom the serum. As a result, the ability to produce recombinant proteinwithout this residue is advantageous. See, e.g., Borrebaeck et al.,Immunology Today 14:477-479 (1993) and Kagawa et al., J. Biol. Chem.263:17508-17515 (1988). Utilizing the FITC labeled IB₄ lectin and flowcytometry it was demonstrated that the Gal-Gal residue is not present onthe surface of CHO-S cells. See FIG. 5; methodology disclosed in Cho etal., J. Biol. Chem. 272:13622-13628 (1997) and Gorelik et al., CancerRes. 55:4185-4173 (1995). In this respect, CHO-S resembles the otherwidely used CHO line tested, CHO-K1. In contrast, the mouse hybridomacell-line tested in this experiment showed high levels of cell-surfaceassociated Gal-Gal carbohydrate. Mass spectroscopy of a purifiedrecombinant protein produced in the cell-line demonstrated the absenceof the Gal-Gal residue (data not shown).

Example 4 Bi-Directional UCOE Vectors for Improved Expression Levels ofMulti-Subunit Recombinant Proteins

[0172] This Example discloses improved expression of recombinantantibody heavy and light protein chains on bi-directional UCOE vectorsystems.

[0173] The two Sfi I sites of pORT1 (Cobra Therapeutics) were changed toMfe I sites by introduction of adapter molecules comprised of annealedoligos Mfe.F, 5′-AACAATTGGCGGC (SEQ ID NO: 10) and Mfe.R,5′-GCCAATTGTTGCC (SEQ ID NO: 11). The HSV TK polyA site was thenamplified from pVgRXR (Invitrogen) with primers TK.F,5′ACGCGTCGACGGAAGGAGACAATACCGGAAG (SEQ ID NO: 12) and TK.R,5′-CCGCTCGAGTTGGGGTGGGGAAAAGGAA (SEQ ID NO: 13), and the Sal I to Xho Ifragment was inserted into the Sal I site. Following this, the murinePGK polyA site was amplified from male BALB/c genomic DNA (Clontech)using primers mPGK.F, 5′-CGGGATCCGCCTGAGAAAGGAAGTGAGCTG (SEQ ID NO: 14)and mPGK.R, 5′-GAAGATCTGGAGGAATGAGCTGGCCCTTA (SEQ ID NO: 15), and theBamH I to Bgl II fragment was cloned into the BamH I site. The Ase I toSal I fragment of pcDNA3.1 containing the neo expression cassette wastreated with T4 DNA polymerase, ligated to Spe I linkers (5′-GACTAGTC;SEQ ID NO: 16) and the Spe I fragment was then cloned into the Spe Isite to give pORTneoF; or the EcoR I to Not I fragment of CET700 (CobraTherapeutics) carrying the puromycin resistance cassette was treatedwith T4 DNA polymerase, ligated to Xba I linkers, and the Xba I fragmentwas cloned into the Xba I site to give pORTpuroF. The Hind III to BamH Imurine CMV promoter fragment from pCMVEGFPN-1 (Cobra) was subcloned intothe Hind III to BamH I sites of the Hybrid UCOE in BKS+ (Cobra). Thehuman CMV promoter was then amplified from plasmid pIRESneo (Clontech)using primers hCMVF, 5′-CTCGAGTTATTAATAGTAATCAATTACGGGGTCAT (SEQ ID NO:17) and hCMVR, 5′-GTCGACGATCTGACGGTTCACTAAACCAGCTCT (SEQ ID NO: 18) andthe Xho I to Sal I fragment was cloned into the Sal I site. The BamH Ito Sal I fragment was then cloned into the BamH I to Sal I sites ofpORTneoF to give pBDUneo100, or into pORTpuroF to give pBDUpuro300. Thetwo ATG codons upstream of the Sal I cloning site in the Hybrid UCOE inBKS+ were altered by site-directed mutagenesis, then the BamH I to Sal Ifragment was cloned into the BamH I to Sal I sites of pORTneoF to givepBDUneo200, or into pORTpuroF to give pBDUpuro400.

[0174] Human antibody light chains were cloned into either the BamH I orSal I sites of all four bi-directional UCOE vectors (pBDUneo100,pBDUneo200, pBDUpuro300 and pBDUpuro400; FIGS. 6-9 and SEQ ID NOs: 1-4,respectively), followed by the heavy chain at the remaining BamH I orSal I cloning site to give pBDUneo112, pBDUneo121, pBDUneo212,pBDUneo221, pBDUpuro112, pBDUpuro12l, pBDUpuro212 and pBDUpuro221.

[0175] Additional bi-directional UCOE vectors suitable for co-expressionof two or more recombinant proteins are disclosed in FIGS. 10-13 (SEQ IDNOs: 5-8) and are referred to as pBDUneo500, pBDUneo600, pBDUpuro700 andpBDUpuro800, respectively. These vectors may be employed, for example,to optimize the hybrid UCOE orientation for antibody expression, as wellas to provide alternative promoter combinations for optimization.

[0176] Plasmid pORTpuroF was digested with XbaI (partial) and NsiI toremove the bovine growth hormone polyA site, then ligated to the SV40early polyA site which was amplified with primers 14506,5′-CCAATGCATAGGTTGGGCTTCGGGAATCGT (SEQ ID NO: 19) and 14507,5′-GCTCTAGATCTCGACGGTATACAGACATGAT (SEQ ID NO: 20) followed by digestionwith XbaI and NsiI, to give plasmid pORTpuroF2. The Hybrid UCOE vectorcontaining the murine CMV promoter downstream of the human RNP UCOE andwith the two mutated ATG codons between the actin promoter and the Sal Isite, was digested with BamHI and HindIII to remove the murine CMVpromoter, then ligated to the human CMV promoter that had been amplifiedwith primers 14425, 5′-CCCAAGCTTATTAATAGTAATCAATTACGGGGTCAT (SEQ ID NO:21) and 14426, 5′-CAAGGATCCGATCTGACGGTTCACTAAACCAGCTCT (SEQ ID NO: 22)followed by digestion with BamHI and HindIII. An adapter comprised ofannealed oligos 14466, 5′-TCGAGTCGTTTAAACTCTAG (SEQ ID NO: 23) and14465, 5′-TCGACTAGAGTTTAAACGAC (SEQ ID NO: 24) was then inserted at theSalI site, digested with PmeI and SalI, and ligated to the murine CMVpromoter that had been amplified with primers 14435,5′-GAATTCGAGCTCGCCCAACTCCGCCCGTTTTAT (SEQ ID NO: 25) and 14436,5′-ATTTGTCGACTCTAGACCCGGGCTGCAGCGAGGAGCTCT (SEQ ID NO: 26) followed bydigestion with SalI. The plasmid either with, or without, the murine CMVpromoter was then digested with BamHI and SalI, and ligated to BamHI andSalI digested pORTneoF to give plasmids pBDUneo500 and pBDUneo600; orwas ligated to BamHI and SalI digested plasmid pORTpuroF2 to giveplasmids pBDUpuro700 and pBDUpuro800, respectively.

[0177] G418 or puromycin-resistant bi-directional UCOE vectorsexpressing antibody heavy and light chains were transfected into CHO-K1or CHO-S cells using Lipofectamine or DMRIE-C (Invitrogen),respectively, following the manufacturer's instructions, and selectedwith 500 ug/ml G418 (neo vectors) or 12.5 ug/ml puromycin (purovectors). Pools were selected and antibody production rates comparedbetween the different constructs to determine the optimal promoter andselectable marker combination for antibody expression in CHO cells.

[0178] The results of expression studies in CHO-S suspensions cells aredepicted in Table 2. These data demonstrated that vectors containing thelight chain expressed from the murine CMV promoter gave the bestantibody expression. Vectors containing puromycin or G418-resistancemarkers were used. Additionally, two bi-directional vectors, onecontaining a puromycin-resistance marker and one containing aG418-resistance marker, were co-transfected. Pools were selected, andantibody production rates determined. Separately, the G418 orpuromycin-resistant transfecant pools displayed similar productionrates, but the production rate of the co-transfected pool wassignificantly higher. This suggests that it may be possible to increaseproduction rate by having two copies of the antibody expression vector,maintained with different selectable markers. Selecting pools withhigher levels of puromycin (25-50 μg/ml versus 12.5 μg/ml) did notcorrelate with increased production.

[0179] Clonal lines were isolated from the puromycin-resistant poolcarrying pBDUpuro421. Fifteen out of twenty-two clonal cell linesexpressed measurable amounts of antibody. Initial production-ratedeterminations indicated that the cell lines had antibody secretionrates of up to 16 pg/cell/day (Table 3). Southern blot analysisidentified at least one clone having a production rate of 13 pg/cell/dayand has approximately a single copy of the vector DNA (clone S421.7).Clones from this pool were isolated with production rates of 3-18pg/cell/day. Clones expressing approx. 5 pg/cell/day were used forinitial fermentation experiments. TABLE 2 Expression of hAb1 (IgG4) frombi-directional UCOE vectors Production Rate Vector H3 Promoter K1Promoter (pg/cell/day) pBDUneo112 murine CMV human CMV 0.3 pBDUneo121human CMV murine CMV 1.5 pBDUneo212 murine CMV human beta-actin 0.06pBDUneo221 human beta-actin murine CMV 1.3 pBDUpuro312 murine CMV humanCMV 0.5 pBDUpuro321 human CMV murine CMV 1.4 pBDUpuro412 murine CMVhuman beta-actin 0.05 pBDUpuro421 human beta-actin murine CMV 2.3Cotransfection** human CMV human CMV 0.7 pBDUneo221 human beta-actinmurine CMV 1.3 pBDUpuro421 human beta-actin murine CMV 1 pBDUneo221+human beta-actin murine CMV 5 pBDUpuro421

[0180] TABLE 3 Expression of hAb1 in clonal CHO-S cell lines transfectedwith pBDUpuro421 Production Rate Puromycin^(R) Cell Line (pg/cell/day)S421.2 5.4 S421.3 0.5 S421.4 0.5 S421.7 13.4 S421.8 5.4 S421.9 0.04S421.12 1.4 S421.14 6.7 S421.15 0.3 S421.16 7.2 S421.17 5 S421.18 0.8S421.20 1.2 S421.21 0.3 S421.22 16

Example 5 Deletion Analysis of the RNP UCOE

[0181] This Example discloses polynucleotide deletions within an RNPUCOE plasmid vector for improved expression of recombinant proteins.Briefly, a series of deletions within the 8 kb RNP UCOE were prepared toidentify both important functional elements and regions that may beremoved without affecting UCOE function. A green fluorescent proteingene (GFP) was cloned into plasmid CET720 (Cobra Therapeutics), anddeletions were subsequently introduced into the UCOE region (FIG. 14).The first set of these deletions was transfected into CHO-S cells, andexamined for the ability to express GFP. In a transient assay (two dayspost transfection), all of the plasmids were able to express GFP asdetermined by fluorescence microscopy. Stable pools carrying thedifferent constructs were then selected, and GFP expression determinedby FACS analysis. One month post-transfection, all of the deletionsdisplayed both a higher percentage of positive cells than a controlplasmid which did not contain the UCOE (>50% versus 10% without theUCOE), and a higher mean fluorescence for the positive population thanthe control vector that did not contain the UCOE (Table 4).

[0182] These data defined more precisely the region of the human RNPUCOE required for full activity and identified a shorter (approximately7 kb) UCOE element with full activity. This new 7 kb UCOE element wasdefined by deletion ARV and extends from nucleotide 2225-9254 in FIG.14. TABLE 4 GFP expression from plasmids containing deletions within the8 kb RNP UCOE Percent Mean Fluorescence of Plasmid Region DeletedPositive Positive Population CET720GFP (8 kb None 68 516 UCOE) CET700GFP(no nt. 2225-10525 10 136 UCOE) ΔBS (4 kb UCOE) nt. 2225-6341 61 370ΔEcoNI nt. 3875-6916 65 439 ΔEX2 nt. 6916-7053 53 384 ΔEM nt. 6916-720966 423 ΔMX nt. 7053-7209 66 464 ΔMluI nt. 7209-8293 58 448 ΔRV nt.9254-10342 72 548

[0183] Vector CET720GFP (represented by SEQ ID NO: 9, which contains the8 kb human RNP UCOE) was digested with EcoRV, MluI, EcoNI, or BamHI plusSalI, the ends were blunted with T4 DNA polymerase and religated toproduce vectors deltaRV, delta MluI, deltaEcoNI and deltaBS,respectively. CET720 was digested with PflMI and blunted with T4 DNApolymerase, then cut with BamHI. The blunt to BamHI fragment was clonedinto the EcoRV to BamHI sites of pBluescript II SK (+) to give pPB720.pPB720 was digested with EcoNI and MluI, MluI and XhoI (partial), orEcoNI and XhoI (partial), the ends were treated with T4 DNA polymeraseand recircularized. The PshAI fragment from each of the resultingvectors was cloned into the PshAI sites of CET720GFP to giveillustrative vectors deltaEM, deltaEX and deltaMX, respectively.

Example 6 Additional Deletion Analysis of the RNP UCOE

[0184] Previous examples have identified via deletion analysis that theUCOE regions from nucleotides 2225-6916 and 9254-10342 of vectorCET720GFP (SEQ ID NO:9) can be removed without loss of UCOE activity(see Example 5 above). In this example, minimal regions of the 8 kb RNPUCOE that are important for its activity are further defined.Importantly, this analysis more precisely defined an illustrative 4.1 kbregion of the human RNP UCOE that retains for full activity.

[0185] Briefly, fragments of the 8 kb RNP UCOE were blunted and ligatedto HindIII linkers (New England Biolabs; Catalog Number S1098S),digested with HindIII and ligated to HindIII digested andcalf-intestinal alkaline phosphatase-treated vector CET700GFP. Vectorswere transfected into CHO-S cells using DMRIE-C (Invitrogen), where allconstructs were capable of expressing GFP in a transient assay (data notshown). After 2 weeks in puromycin selection, the geometric meanfluorescence of the positive population was determined by FACS, andexpressed as a percentage of the control (CET720GFP), the results ofwhich are summarized in Table 5 below. Vector 700FRV, which contains a4.1 kb MfeI to EcoRV fragment of the RNP UCOE, corresponding tonucleotide residues 5152-9254 of CET720GFP, retained full UCOE activityrelative to the 8 kb UCOE region of nucleotide residues 2225-10525 ofCET720GFP. Thus, this 4.1 kb UCOE fragment represents a new minimal UCOEelement that retains activity at levels comparable to that for the full8 kb UCOE element. TABLE 5 Percent Plasmid UCOE Region Present ofControl CET720GFP (8 kb UCOE) Nucleotides 2225-10525 100 CET700GFP (noUCOE) None 10 delta RV Nucleotides 2225-9254 99 Nucleotides 10342-10525700HRV.R Nucleotides 2240-9254 121 700FRV.R Nucleotides 5152-9254 122700BRV.R Nucleotides 6341-9254 73

[0186] Activity was also determined for the three UCOE fragmentscontained within 700HRV.R, 700FRV.R and 700BRV.R, but with the UCOEfragments inserted in the opposite orientation, to give plasmids700HRV.F, 700FRV.F and 700BRV.F, respectively. Again, all plasmids werecapable of expressing GFP in a transient assay. After 3 weeks inpuromycin selection, the geometric mean fluorescence of the positivepopulation was determined by FACS, and expressed as a percentage of thecontrol (CET720GFP), the results of which are summarized in Table 6below. While lower levels of activity were observed for plasmidscontaining UCOE in the opposite orientation, all fragments nonethelessretained UCOE activity. TABLE 6 Percent Plasmid UCOE Region Present ofControl CET720GFP (8 kb UCOE) Nucleotides 2225-10525 100 CET700GFP (noUCOE) None 6 700HRV.F Nucleotides 2240-9254 59 700FRV.F Nucleotides5152-9254 43 700BRV.F Nucleotides 6341-9254 30

Example 7 Preparation of Additional Illustrative Bi-Directional UCOEVectors

[0187] Previous examples have described the preparation and evaluationof numerous illustrative UCOE vectors. In this example, additional UCOEvectors were constructed. For example, vectors pBDUpuro350 (SEQ ID NO:27) and pBDUpuro450 (SEQ ID NO: 28) were prepared so as to be equivalentto the previously described vectors pBDUpuro300 and pBDUpuro400, withthe exception that the polyA site following the puromycin resistancegene was replaced with the SV40 polyA site (see also FIGS. 15 and 16).Several additional vectors will replace the 8 kb RNP UCOE element withthe 4.1 kb MfeI-EcoRV fragment identified hereinabove by deletionanalysis to retain full UCOE activity. To alter the polyA site of thepuromycin resistance cassette of the pBDUpuro vector series, the SV40polyA site was amplified from pBSneo.23 by polymerase chain reaction andthe reaction product was digested with NsiI and XbaI and inserted intothe NsiI to XbaI site of pORTpuroF to replace the BGH polyA site. Thisnew vector, pORTpuroF′ was sequentially digested with BamHI and SalI,and cloned into the BamHI to SalI sites of HUCMV (hybrid UCOE withmurine CMV promoter) to give plasmid pBDUpuro350 (SEQ ID NO: 27; seealso FIG. 15), or cloned into the BamHI site of pUCOEact3 (hybrid UCOEwith site directed mutagenesis of the ATG codons in the actin promoter)to give pBDUpuro450 (SEQ ID NO: 28; see also FIG. 16). Addditional UCOEvectors are constructed by inserting a HindIII site at the position ofthe KpnI site at the border between the human beta-actin and RNP UCOEfragments in plasmids pUCOEact3 and pUCOEact3hCMV. The 4 kb HindIIIfragment carrying the RNP UCOE is then removed and replaced with the 4.1kb RNP UCOE fragment from 700FRV.R. The SalI to BamHI (partial)fragments are then cloned into the SalI to BamHI sites of pORTneoF andpORTpuroF′ to give pBDUpuro1200 (SEQ ID NO: 29; see also FIG. 17),pBDUpuro1450 (SEQ ID NO: 30; see also FIG. 18), pBDUneo1600 (SEQ ID NO:31; see also FIG. 19) and pBDUpuro1800 (SEQ ID NO: 32; see also FIG.20).

Example 8 Evaluation of Vector Features Important for Bi-DirectionalUCOE Activity

[0188] 1. Effect of Bi-directional UCOE Vector Copy Number on AntibodyProduction Rate in CHO-S Cells:

[0189] CHO-S cell line S421.7 have been shown to contain a single copyof vector pBDUpuro421, which expresses hAb1 (IgG4). To determine ifadditional vector copies could increase antibody expression levels,S421.7 was retransfected with vector pBDUneo221 that also expresseshAb1, but carries a different selectable marker (G418 resistance).Clonal cell lines were isolated and analyzed for production rate (FIG.21). Many cell lines appear to have higher production rates than theparental line S421.7, indicating that additional vector copies canincrease production. Initial copy number analysis indicated that celllines S7.16, S7.20 and S7.23 contain 1-2 copies of vector pBDUneo221(data not shown).

[0190] 2. Effect of Hybrid UCOE Orientation and Promoter Choice onAntibody production in CHO-S Cells

[0191] Stable pools of CHO-S cells carrying various bi-directional UCOEvectors expressing hAb1 (IgG4) were analyzed to determine both theeffect of the orientation of the hybrid UCOE relative to the antibodygenes, and the effect of different promoters on antibody expressionrates. CHO-S cells were transfected with a series of bi-directional UCOEvectors expressing hAb1 (IgG4), and stable pools were selected witheither 12.5 μg/ml puromycin or 500 μg/ml G418. The location of the heavychain (H) and the light chain (K) relative to the hybrid UCOE element(actin end versus RNP end) and the promoters used are shown in Table 7below. Antibody production rates were measured by ELISA, and westernblot analysis was performed to determine the distribution of light chainand heavy chain in the supernatant (supe) versus the cell lysate(lysate). The orientation of the hybrid UCOE showed only minor effectson antibody expression levels, however the choice of promotercombination resulted in some differences in production rates. Thehighest production rates were obtained in these experiments usingillustrative vectors expressing the heavy chain from the humanbeta-actin promoter, and the light chain from either the murine CMV orhuman CMV promoters (e.g., pBDUpuro454 and pBDUpuro804). TABLE 7 HeavyHeavy Kappa Kappa Actin Chain Chain Chain Chain Prod. Rate Vector EndRNP end (supe) (lysate) (supe) (lysate) (pg/cell/day) pBDUpuro352 hCMV-KmCMV-H + ++ + − 0.159 pBDUpuro354 hCMV-H mCMV-K + + +++ + 0.256pBDUpuro452 actin-K mCMV-H +/− ++ +/− − 0.0056 pBDUpuro454 actin-HmCMV-K ++ + +++ ++ 0.657 pBDUpuro702 hCMV-K mCMV-H ++ ++ ++ + 0.391pBDUpuro704 hCMV-H mCMV-K ++ ++ ++ +/− 0.170 pBDUpuro802 actin-K mCMV-H+/− +++ +/− − 0.020 pBDUpuro804 actin-H mCMV-K +++ +++ +++ ++ 0.608

[0192] 3. Transcription Versus Production Rates in CHO-S Cells

[0193] Clonal cell lines were isolated from the puromycin resistantpools carrying pBDUpuro452, pBDUpuro454 and pBDUpuro804. Approximatelytwo thirds of clonal lines carrying pBDUpuro454 and pBDUpuro804 hadmeasurable antibody production rates from 1 to 10 pg/cell/day, similarto previous results obtained with vector pBDUpuro421 (data not shown).TaqMan assays on genomic DNA samples suggested that clonal lines S452.3,S454.5 and S804.4 carried single copies of the bidirectional UCOEvectors pBDUpuro452, pBDUpuro454 and pBDUpuro804, respectively. Cellline S421.7, previously shown by Southern analysis to have a single copyof pBDUpuro421 (pBDUpuro400 with the heavy chain expressed from thehuman actin promoter, and the light chain from the murine CMV promoter)was included as a control. To study the correlation between productionrate and transcription of the antibody chains, TaqMan RT-PCR assays werecarried out on these lines, the results of which are summarized in Table8 below. Both heavy and light chain RNA levels in line S452.3 weresignificantly lower than those observed in the control lines D6 andS421.7, that have been shown to express antibody well. However, linesS454.5 and S804.4 had RNA levels as well as production levels similar tothe positive control lines. Together with western blot analysis (datanot shown), these results indicate that the RNA levels of antibody heavyand light chains observed in these lines correlates with the productionrates observed. TABLE 8 Production Rate Light Chain Heavy Chain CellLine (pg/cell/day) (Ct) (Ct) CHO-S 0 40 40 D6 5.5 20.39 22.86 S421.74.57 21.91 23.90 S454.5 3.52 22.12 23.96 S804.4 3.62 22.40 24.11 S452.30.07 29.62 26.47

[0194] U.S. patents, U.S. patent application publications, U.S. patentapplications, foreign patents, foreign patent applications andnon-patent publications referred to in this specification and/or listedin the Application Data Sheet are incorporated herein by reference intheir entirety.

[0195] From the foregoing it will be appreciated that, although specificembodiments of the invention have been described herein for purposes ofillustration, various modifications may be made without deviating fromthe spirit and scope of the invention. Accordingly, the invention is notlimited except as by the appended claims.

1 32 1 12701 DNA Artificial Sequence Artificial Sequence containinghuman UCOE elements and vector sequence 1 acgttgtaaa acgacggccagtgaattgta atacgactca ctatagggcg aattgggtac 60 cgggcccccc ctcgaggtcgagttggggtg gggaaaagga agaaacgcgg gcgtattggc 120 cccaatgggg tctcggtggggtatcgacag agtgccagcc ctgggaccga accccgcgtt 180 tatgaacaaa cgacccaacacccgtgcgtt ttattctgtc tttttattgc cgtcatagcg 240 cgggttcctt ccggtattgtctccttccgt cgacgatctg acggttcact aaaccagctc 300 tgcttatata gacctcccaccgtacacgcc taccgcccat ttgcgtcaat ggggcggagt 360 tgttacgaca ttttggaaagtcccgttgat tttggtgcca aaacaaactc ccattgacgt 420 caatggggtg gagacttggaaatccccgtg agtcaaaccg ctatccacgc ccattgatgt 480 actgccaaaa ccgcatcaccatggtaatag cgatgactaa tacgtagatg tactgccaag 540 taggaaagtc ccataaggtcatgtactggg cataatgcca ggcgggccat ttaccgtcat 600 tgacgtcaat agggggcgtacttggcatat gatacacttg atgtactgcc aagtgggcag 660 tttaccgtaa atactccacccattgacgtc aatggaaagt ccctattggc gttactatgg 720 gaacatacgt cattattgacgtcaatgggc gggggtcgtt gggcggtcag ccaggcgggc 780 catttaccgt aagttatgtaacgcggaact ccatatatgg gctatgaact aatgaccccg 840 taattgatta ctattaataactcgacggta tcatggtggc gaccggcatg gtgagctgcg 900 agaatagccg ggcgcgctgtgagccgaagt cgcccccgcc ctggccactt ccggcgcgcc 960 gagtccttag gccgccagggggcgccggcg cgcgcccaga ttggggacaa aggaagccgg 1020 gccggccgcg ttattaccataaaaggcaaa cactggtcgg aggcgtcccc gcggcgcgcg 1080 gcaggaagcc aggccccaaccccctcccaa ccgggcgcca gccccgcctc cgcccggttc 1140 aaacagcgac cgggtcgcgcgcgcgcacgc agcggccaca ccctcgggcg ccagcggctc 1200 gggcaggaag tggcgcaagcgcccgggccc cagaacgcac gcgcgattag cgccattgag 1260 tcccagcgcg cacgcgcaattagcgccaat tcccagcgcg cacgcagtta gcgcccaaag 1320 gaccagcgcg cacgcgcatggcgccccagc ccccaccggg cctgacgggg gctacgccgc 1380 gcccaccgtg cgatccccattggcaagagc ccggctcaga caaagacccc gccggttgcc 1440 cccgccccga gagcggcacccccggagcgc gcccgcccga gcgcggcctc gcgcctgcga 1500 actggcgtgg ggtgtcccccatctccggag gcccaggggc ttctcccgcg ccccccacgg 1560 cggtccggtt ccgccccatgcgccccccgc tgcggcccag acggcggctc tgcacgggcg 1620 aagggccgcg gccgcatgccccggtcggct ggccgggctt acctggcggc gggtgtggac 1680 gggcggcgga tcggcaaaggcgaggctctg tgctcgcggg cggacgcggt ctcggcggtg 1740 gtggcgcgtc gcgccgctgggttttatagg gcgccgccgc ggccgctcga gccataaaag 1800 gcaactttcg gaacggcgcacgctgattgg ccccgcgccg ctcactcacc ggcttcgccg 1860 cacagtgcag catttttttaccccctctcc cctccttttg cgaaaaaaaa aaagagcgag 1920 agcgagattg aggaagaggaggagggagag ttttggcgtt ggccgccttg gggtgctggg 1980 cccgggggct gggggcgcgcgccgtggccc ccgcgcccca cgctgggcag tgcccggttc 2040 ggccccgcat ggccaggcctgcccccggcc tgcccgtctc tcgggccccc cacccaccgc 2100 gggacatcct aggtgtggacatctcttggg cactgagcgc ccaggtgggg tgggccaggg 2160 tctgcacggg tgccagggccctgggttctg tacgctcctg cagaaggagc tcttggaggg 2220 catggagtgg ccaggcagtcactccccctt gccgacttca gagcaactgc cctgaaagca 2280 gggcctgagg acctctggctgtggggctca gctagctaaa tgtgctgggt gggtcactag 2340 ggagagacct gggcttgagaggtagagtgt ggtgttgggg gagtcaggtg gcttgcggcc 2400 attagagtcg caggaccacactccccagga cagggcaggg gccagcggtc cagtggctgg 2460 aggtggcccg tgatgaaggctacaaaccta cccagccgca gccctgggaa ggaagtgggc 2520 tctacagggc agggcaccttttaccctgga gctgcctgct tttgagggta acagtcacgc 2580 ccagccaaga ccaggcctggggcgttagtg ggtgacctag gcactgcggg gcgggggggc 2640 tgggtctaca cagcctgggtctgggcccac cgtccgttgt atgtctgcta tgcgcagcca 2700 cagctgaact gccctcccagaccatctgga ggccgctggg ggactctggg gaccaagact 2760 ccatgtgcca cagaggattgggggcggggc ggtgctagga actcaaagcc agcctgggaa 2820 gaccctgtcc ttgtcaccctttcttgcctt gggtctgtcc actgagtagc acacaagacc 2880 gggtgggcag ggtccgttctgctccgggaa tcacagactg tgtgtaccca ggtggtgggc 2940 atgcagcgat cagtggcgtgggaccacaga gggggcccgc ggtacctaaa acagcttcac 3000 atggcttaaa ataggggaccaatgtctttt ccaatctaag tcccatttat aataaagtcc 3060 atgttccatt tttaaaggacaatcctttcg gtttaaaacc aggcacgatt acccaaacaa 3120 ctcacaacgg taaagcactgtgaatcttct ctgttctgca atcccaactt ggtttctgct 3180 cagaaaccct ccctctttccaatcggtaat taaataacaa aaggaaaaaa cttaagatgc 3240 ttcaaccccg tttcgtgacactttgaaaaa agaatcacct cttgcaaaca cccgctcccg 3300 acccccgccg ctgaagcccggcgtccagag gcctaagcgc gggtgcccgc ccccacccgg 3360 gagcgcgggc ctcgtggtcagcgcatccgc ggggagaaac aaaggccgcg gcacgggggc 3420 tcaagggcac tgcgccacaccgcacgcgcc tacccccgcg cggccacgtt aactggcggt 3480 cgccgcagcc tcgggacagccggccgcgcg ccgccaggct cgcggacgcg ggaccacgcg 3540 ccgccctccg ggaggcccaagtctcgaccc agccccgcgt ggcgctgggg gagggggcgc 3600 ctccgccgga acgcgggtgggggaggggag ggggaaatgc gctttgtctc gaaatggggc 3660 aaccgtcgcc acagctccctaccccctcga gggcagagca gtccccccac taactaccgg 3720 gctggccgcg cgccaggccagccgcgaggc caccgcccga ccctccactc cttcccgcag 3780 ctcccggcgc ggggtccggcgagaagggga ggggagggga gcggagaacc gggcccccgg 3840 gacgcgtgtg gcatctgaagcaccaccagc gagcgagagc tagagagaag gaaagccacc 3900 gacttcaccg cctccgagctgctccgggtc gcgggtctgc agcgtctccg gccctccgcg 3960 cctacagctc aagccacatccgaaggggga gggagccggg agctgcgcgc ggggccgccg 4020 gggggagggg tggcaccgcccacgccgggc ggccacgaag ggcggggcag cgggcgcgcg 4080 cgcggcgggg ggaggggccggcgccgcgcc cgctgggaat tggggcccta gggggagggc 4140 ggaggcgccg acgaccgcggcacttaccgt tcgcggcgtg gcgcccggtg gtccccaagg 4200 ggagggaagg gggaggcggggcgaggacag tgaccggagt ctcctcagcg gtggcttttc 4260 tgcttggcag cctcagcggctggcgccaaa accggactcc gcccacttcc tcgcccgccg 4320 gtgcgagggt gtggaatcctccagacgctg ggggaggggg agttgggagc ttaaaaacta 4380 gtaccccttt gggaccactttcagcagcga actctcctgt acaccagggg tcagttccac 4440 agacgcgggc caggggtgggtcattgcggc gtgaacaata atttgactag aagttgattc 4500 gggtgtttcc ggaaggggccgagtcaatcc gccgagttgg ggcacggaaa acaaaaaggg 4560 aaggctacta agatttttctggcgggggtt atcattggcg taactgcagg gaccacctcc 4620 cgggttgagg gggctggatctccaggctgc ggattaagcc cctcccgtcg gcgttaattt 4680 caaactgcgc gacgtttctcacctgccttc gccaaggcag gggccgggac cctattccaa 4740 gaggtagtaa ctagcaggactctagccttc cgcaattcat tgagcgcatt tacggaagta 4800 acgtcgggta ctgtctctggccgcaagggt gggaggagta cgcatttggc gtaaggtggg 4860 gcgtagagcc ttcccgccattggcggcgga tagggcgttt acgcgacggc ctgacgtagc 4920 ggaagacgcg ttagtgggggggaaggttct agaaaagcgg cggcagcggc tctagcggca 4980 gtagcagcag cgccgggtcccgtgcggagg tgctcctcgc agagttgttt ctcgagcagc 5040 ggcagttctc actacagcgccaggacgagt ccggttcgtg ttcgtccgcg gagatctctc 5100 tcatctcgct cggctgcgggaaatcgggct gaagcgactg agtccgcgat ggaggtaacg 5160 ggtttgaaat caatgagttattgaaaaggg catggcgagg ccgttggcgc ctcagtggaa 5220 gtcggccagc cgcctccgtgggagagaggc aggaaatcgg accaattcag tagcagtggg 5280 gcttaaggtt tatgaacggggtcttgagcg gaggcctgag cgtacaaaca gcttccccac 5340 cctcagcctc ccggcgccatttcccttcac tgggggtggg ggatggggag ctttcacatg 5400 gcggacgctg ccccgctggggtgaaagtgg ggcgcggagg cgggaattct tattcccttt 5460 ctaaagcacg ctgcttcgggggccacggcg tctcctcggc gagcgtttcg gcgggcagca 5520 ggtcctcgtg agcgaggctgcggagcttcc cctccccctc tctcccggga accgatttgg 5580 cggccgccat tttcatggctcgccttcctc tcagcgtttt ccttataact cttttatttt 5640 cttagtgtgc tttctctatcaagaagtaga agtggttaac tatttttttt ttcttctcgg 5700 gctgttttca tatcgtttcgaggtggattt ggagtgtttt gtgagcttgg atctttagag 5760 tcctgcgcac ctcattaaaggcgctcagcc ttcccctcga tgaaatggcg ccattgcgtt 5820 cggaagccac accgaagagcggggaggggg ggtgctccgg gtttgcgggc ccggtttcag 5880 agaagatatc accacccagggcgtcgggcc gggttcaatg cgagccgtag gacaaagaaa 5940 ccattttatg tttttcctgtcttttttttc ctttgagtaa cggttttatc tgggtctgca 6000 gtcagtaaaa cgacagatgaaccgcggcaa aataaacata aattggaagc catcggccac 6060 gaggggcagg gacgaaggtggttttctggg cgggggaggg atattcgcgt cagaatcctt 6120 tactgttctt aaggattccgtttaagttgt agagctgact cattttaagt aatgttgtta 6180 ctgagaagtt taacccttacgggacagatc catggacctt tatagatgat tacgaggaaa 6240 gtgaaataac gattttgtccttagttatac ttcgattaaa acatggcttc agaggctcct 6300 tcctgtaatg cgtatggattgatgtgcaaa actgttttgg gcctgggccg ctctgtattt 6360 gaactttgtt acttttctcattttgtttgc aatcttggtt gaacattaca ttgataagca 6420 taaggtctca agcgaagggggtctacctgg ttatttttct ttgaccctaa gcacgtttat 6480 aaaataacat tgtttaaaatcgatagtgga catcgggtaa gtttggataa attgtgaggt 6540 aagtaatgag tttttgctttttgttagtga tttgtaaaac ttgttataaa tgtacattat 6600 ccgtaatttc agtttagagataacctatgt gctgacgaca attaagaata aaaactagct 6660 gaaaaaatga aaataactatcgtgacaagt aaccatttca aaagactgct ttgtgtctca 6720 taggagctag tttgatcatttcagttaatt ttttctttaa tttttacgag tcatgaaaac 6780 tacaggaaaa aaaatctgaactgggtttta ccactacttt ttaggagttg ggagcatgcg 6840 aatggaggga gagctccgtagaactgggat gagagcagca attaatgctg cttgctagga 6900 acaaaaaata attgattgaaaattacgtgt gactttttag tttgcattat gcgtttgtag 6960 cagttggtcc tggatatcactttctctcgt ttgaggtttt ttaacctagt taacttttaa 7020 gacaggtttc cttaacattcataagtgccc agaatacagc tgtgtagtac agcatataaa 7080 gatttcagct ctgaggtttttcctattgac ttggaaaatt gttttgtgcc tgtcgcttgc 7140 cacatggcca atcaagtaagcttcgaattc gagctcgccc aactccgccc gttttatgac 7200 tagaaccaat agtttttaatgccaaatgca ctgaaatccc ctaatttgca aagccaaacg 7260 ccccctatgt gagtaatacggggacttttt acccaatttc ccaagcggaa agccccctaa 7320 tacactcata tggcatatgaatcagcacgg tcatgcactc taatggcggc ccatagggac 7380 tttccacata gggggcgttcaccatttccc agcatagggg tggtgactca atggccttta 7440 cccaagtaca ttgggtcaatgggaggtaag ccaatgggtt tttcccatta ctggcaagca 7500 cactgagtca aatgggactttccactgggt tttgcccaag tacattgggt caatgggagg 7560 tgagccaatg ggaaaaacccattgctgcca agtacactga ctcaataggg actttccaat 7620 gggtttttcc attgttggcaagcatataag gtcaatgtgg gtgagtcaat agggactttc 7680 cattgtattc tgcccagtacataaggtcaa tagggggtga atcaacagga aagtcccatt 7740 ggagccaagt acactgcgtcaatagggact ttccattggg ttttgcccag tacataaggt 7800 caatagggga tgagtcaatgggaaaaaccc attggagcca agtacactga ctcaataggg 7860 actttccatt gggttttgcccagtacataa ggtcaatagg gggtgagtca acaggaaagt 7920 cccattggag ccaagtacattgagtcaata gggactttcc aatgggtttt gcccagtaca 7980 taaggtcaat gggaggtaagccaatgggtt tttcccatta ctggcacgta tactgagtca 8040 ttagggactt tccaatgggttttgcccagt acataaggtc aataggggtg aatcaacagg 8100 aaagtcccat tggagccaagtacactgagt caatagggac tttccattgg gttttgccca 8160 gtacaaaagg tcaatagggggtgagtcaat gggtttttcc cattattggc acgtacataa 8220 ggtcaatagg ggtgagtcattgggtttttc cagccaattt aattaaaacg ccatgtactt 8280 tcccaccatt gacgtcaatgggctattgaa actaatgcaa cgtgaccttt aaacggtact 8340 ttcccatagc tgattaatgggaaagtaccg ttctcgagcc aatacacgtc aatgggaagt 8400 gaaagggcag ccaaaacgtaacaccgcccc ggttttcccc tggaaattcc atattggcac 8460 gcattctatt ggctgagctgcgttctacgt gggtataaga ggcgcgacca gcgtcggtac 8520 cgtcgcagtc ttcggtctgaccaccgtaga acgcagagct cctcgctgca gcccgggtct 8580 agaggatccg cctgagaaaggaagtgagct gtaaaggctg agctctctct ctgacgtatg 8640 tagcctctgg ttagcttcgtcactcactgt tcttgactca gcatggcaat ctgatgaaat 8700 cccagctgta agtctgcagaaattgatgat ctattaaaca ataaagatgt ccactaaaat 8760 ggaagttttt cctgtcatactttgttaaga agggtgagaa cagagtacct acattttgaa 8820 tggaaggatt ggagctacgggggtgggggt ggggtgggat tagataaatg cctgctcttt 8880 actgaaggct ctttactattgctttatgat aatgtttcat agttggatat cataatttaa 8940 acaagcaaaa ccaaattaagggccagctca ttcctccaga tccactagta attctgtgga 9000 atgtgtgtca gttagggtgtggaaagtccc caggctcccc agcaggcaga agtatgcaaa 9060 gcatgcatct caattagtcagcaaccaggt gtggaaagtc cccaggctcc ccagcaggca 9120 gaagtatgca aagcatgcatctcaattagt cagcaaccat agtcccgccc ctaactccgc 9180 ccatcccgcc cctaactccgcccagttccg cccattctcc gccccatggc tgactaattt 9240 tttttattta tgcagaggccgaggccgcct ctgcctctga gctattccag aagtagtgag 9300 gaggcttttt tggaggcctaggcttttgca aaaagctccc gggagcttgt atatccattt 9360 tcggatctga tcaagagacaggatgaggat cgtttcgcat gattgaacaa gatggattgc 9420 acgcaggttc tccggccgcttgggtggaga ggctattcgg ctatgactgg gcacaacaga 9480 caatcggctg ctctgatgccgccgtgttcc ggctgtcagc gcaggggcgc ccggttcttt 9540 ttgtcaagac cgacctgtccggtgccctga atgaactgca ggacgaggca gcgcggctat 9600 cstggctggc cacgacgggcgttccttgcg cagctgtgct cgacgttgtc actgaagcgg 9660 gaagggactg gctgctattgggcgaagtgc cggggcagga tctcctgtca tctcaccttg 9720 ctcctgccga gaaagtatccatcatggctg atgcaatgcg gcggctgcat acgcttgatc 9780 cggctacctg cccattcgaccaccaagcga aacatcgcat cgagcgagca cgtactcgga 9840 tggaagccgg tcttgtcgatcaggatgatc tggacgaaga gcatcagggg ctcgcgccag 9900 ccgaactgtt cgccaggctcaaggcgcgca tgcccgacgg cgaggatctc gtcgtgaccc 9960 atggcgatgc ctgcttgccgaatatcatgg tggaaaatgg ccgcttttct ggattcatcg 10020 actgtggccg gctgggtgtggcggaccgct atcaggacat agcgttggct acccgtgata 10080 ttgctgaaga gcttggcggcgaatgggctg accgcttcct cgtgctttac ggtatcgccg 10140 ctcccgattc gcagcgcatcgccttctatc gccttcttga cgagttcttc tgagcgggac 10200 tctggggttc gaaatgaccgaccaagcgac gcccaacctg ccatcacgag atttcgattc 10260 caccgccgcc ttctatgaaaggttgggctt cggaatcgtt ttccgggacg ccggctggat 10320 gatcctccag cgcggggatctcatgctgga gttcttcgcc caccccaact tgtttattgc 10380 agcttataat ggttacaaataaagcaatag catcacaaat ttcacaaata aagcattttt 10440 ttcactgcat tctagttgtggtttgtccaa actcatcaat gtatcttatc atgtctgtat 10500 accgtcgaga ctagttctagagcggccgcc accgcggtgg agctccagct tttgttccct 10560 ttagtgaggg ttaatttcgagcttggcgta atcatggtca tagctgtttc ctgtgtgaaa 10620 ttgttatccg ctcacaattccacacaacat acgagccgga agcataaagt gtaaagcctg 10680 gggtgcctaa tgagtgagctaactcacatt aattgcgttg cgctcactgc ccgctttcca 10740 gtcgggaaac ctgtcgtgccagggggtacc taggccgggc aacaattggc ggccggccgc 10800 acttttcggg gaaatgtgcgcggaacccct atttgtttat ttttctaaat acattcaaat 10860 atgtatccgc tcatgagacaataaccctga taaatgcttc aataatattg aaaaaggaag 10920 agtatgagta ttcaacatttccgtgtcgcc cttattccct tttttgcggc attttgcctt 10980 cctgtttttg ctcacccagaaacgctggtg aaagtaaaag atgctgaaga tcagttgggt 11040 gcacgagtgg gttacatcgaactggatctc aacagcggta agatccttga gagttttcgc 11100 cccgaagaac gttttccaatgatgagcact tttaaagttc tgctatgtgg cgcggtatta 11160 tcccgtattg acgccgggcaagagcaactc ggtcgccgca tacactattc tcagaatgac 11220 ttggttgagt actcaccagtcacagaaaag catcttacgg atggcatgac agtaagagaa 11280 ttatgcagtg ctgccataaccatgagtgat aacactgcgg ccaacttact tctgacaacg 11340 atcggaggac cgaaggagctaaccgctttt ttgcacaaca tgggggatca tgtaactcgc 11400 cttgatcgtt gggaaccggagctgaatgaa gccataccaa acgacgagcg tgacaccacg 11460 atgcctgtag caatggcaacaacgttgcgc aaactattaa ctggcgaact acttactcta 11520 gcttcccggc aacaattaatagactggatg gaggcggata aagttgcagg accacttctg 11580 cgctcggccc ttccggctggctggtttatt gctgataaat ctggagccgg tgagcgtggg 11640 tctcgcggta tcattgcagcactggggcca gatggtaagc cctcccgtat cgtagttatc 11700 tacacgacgg ggagtcaggcaactatggat gaacgaaata gacagatcgc tgagataggt 11760 gcctcactga ttaagcattggtaactgtca gaccctaggc cgggcaacaa ttggcggccg 11820 gccctgcatt aatgaatcggccaacgcgcg gggagaggcg gtttgcgtat tgggcgctct 11880 tccgcttcct cgctcactgactcgctgcgc tcggtcgttc ggctgcggcg agcggtatca 11940 gctcactcaa aggcggtaatacggttatcc acagaatcag gggataacgc aggaaagaac 12000 atgtgagcaa aaggccagcaaaaggccagg aaccgtaaaa aggccgcgtt gctggcgttt 12060 ttccataggc tccgcccccctgacgagcat cacaaaaatc gacgctcaag tcagaggtgg 12120 cgaaacccga caggactataaagataccag gcgtttcccc ctggaagctc cctcgtgcgc 12180 tctcctgttc cgaccctgccgcttaccgga tacctgtccg cctttctccc ttcgggaagc 12240 gtggcgcttt ctcatagctcacgctgtagg tatctcagtt cggtgtaggt cgttcgctcc 12300 aagctgggct gtgtgcacgaaccccccgtt cagcccgacc gctgcgcctt atccggtaac 12360 tatcgtcttg agtccaacccggtaagacac gacttatcgc cactggcagc agccactggt 12420 aacaggatta gcagagcgaggtatgtaggc ggtgctacag agttcttgaa gtggtggcct 12480 aactacggct acactagaaggacagtattt ggtatctgcg ctctgctgaa gccagttacc 12540 ttcggaaaaa gagttggtagctcttgatcc ggcaaacaaa ccaccgctgg tagcggtggt 12600 ttttttgttt gcaagcagcagattacgcgc agaaaaaaag gatctcaaga agatcctttg 12660 atcttttcta cggggtctgacgctcagtgg aacgaaaact c 12701 2 12109 DNA Artificial Sequence ArtificialSequence containing human UCOE elements and vector sequence 2 acgttgtaaaacgacggcca gtgaattgta atacgactca ctatagggcg aattgggtac 60 cgggccccccctcgaggtcg agttggggtg gggaaaagga agaaacgcgg gcgtattggc 120 cccaatggggtctcggtggg gtatcgacag agtgccagcc ctgggaccga accccgcgtt 180 tatgaacaaacgacccaaca cccgtgcgtt ttattctgtc tttttattgc cgtcatagcg 240 cgggttccttccggtattgt ctccttccgt cgacggtatc aaggtggcga ccggaatggt 300 gagctgcgagaatagccggg cgcgctgtga gccgaagtcg cccccgccct ggccacttcc 360 ggcgcgccgagtccttaggc cgccaggggg cgccggcgcg cgcccagatt ggggacaaag 420 gaagccgggccggccgcgtt attaccataa aaggcaaaca ctggtcggag gcgtccccgc 480 ggcgcgcggcaggaagccag gccccaaccc cctcccaacc gggcgccagc cccgcctccg 540 cccggttcaaacagcgaccg ggtcgcgcgc gcgcacgcag cggccacacc ctcgggcgcc 600 agcggctcgggcaggaagtg gcgcaagcgc ccgggcccca gaacgcacgc gcgattagcg 660 ccattgagtcccagcgcgca cgcgcaatta gcgccaattc ccagcgcgca cgcagttagc 720 gcccaaaggaccagcgcgca cgcgcatggc gccccagccc ccaccgggcc tgacgggggc 780 tacgccgcgcccaccgtgcg atccccattg gcaagagccc ggctcagaca aagaccccgc 840 cggttgcccccgccccgaga gcggcacccc cggagcgcgc ccgcccgagc gcggcctcgc 900 gcctgcgaactggcgtgggg tgtcccccat ctccggaggc ccaggggctt ctcccgcgcc 960 ccccacggcggtccggttcc gccccatgcg ccccccgctg cggcccagac ggcggctctg 1020 cacgggcgaagggccgcggc cgcatgcccc ggtcggctgg ccgggcttac ctggcggcgg 1080 gtgtggacgggcggcggatc ggcaaaggcg aggctctgtg ctcgcgggcg gacgcggtct 1140 cggcggtggtggcgcgtcgc gccgctgggt tttatagggc gccgccgcgg ccgctcgagc 1200 cataaaaggcaactttcgga acggcgcacg ctgattggcc ccgcgccgct cactcaccgg 1260 cttcgccgcacagtgcagca tttttttacc ccctctcccc tccttttgcg aaaaaaaaaa 1320 agagcgagagcgagattgag gaagaggagg agggagagtt ttggcgttgg ccgccttggg 1380 gtgctgggcccgggggctgg gggcgcgcgc cgtggccccc gcgccccacg ctgggcagtg 1440 cccggttcggccccgcatgg ccaggcctgc ccccggcctg cccgtctctc gggcccccca 1500 cccaccgcgggacatcctag gtgtggacat ctcttgggca ctgagcgccc aggtggggtg 1560 ggccagggtctgcacgggtg ccagggccct gggttctgta cgctcctgca gaaggagctc 1620 ttggagggcatggagtggcc aggcagtcac tcccccttgc cgacttcaga gcaactgccc 1680 tgaaagcagggcctgaggac ctctggctgt ggggctcagc tagctaaatg tgctgggtgg 1740 gtcactagggagagacctgg gcttgagagg tagagtgtgg tgttggggga gtcaggtggc 1800 ttgcggccattagagtcgca ggaccacact ccccaggaca gggcaggggc cagcggtcca 1860 gtggctggaggtggcccgtg atgaaggcta caaacctacc cagccgcagc cctgggaagg 1920 aagtgggctctacagggcag ggcacctttt accctggagc tgcctgcttt tgagggtaac 1980 agtcacgcccagccaagacc aggcctgggg cgttagtggg tgacctaggc actgcggggc 2040 gggggggctgggtctacaca gcctgggtct gggcccaccg tccgttgtat gtctgctatg 2100 cgcagccacagctgaactgc cctcccagac catctggagg ccgctggggg actctgggga 2160 ccaagactccatgtgccaca gaggattggg ggcggggcgg tgctaggaac tcaaagccag 2220 cctgggaagaccctgtcctt gtcacccttt cttgccttgg gtctgtccac tgagtagcac 2280 acaagaccgggtgggcaggg tccgttctgc tccgggaatc acagactgtg tgtacccagg 2340 tggtgggcatgcagcgatca gtggcgtggg accacagagg gggcccgcgg tacctaaaac 2400 agcttcacatggcttaaaat aggggaccaa tgtcttttcc aatctaagtc ccatttataa 2460 taaagtccatgttccatttt taaaggacaa tcctttcggt ttaaaaccag gcacgattac 2520 ccaaacaactcacaacggta aagcactgtg aatcttctct gttctgcaat cccaacttgg 2580 tttctgctcagaaaccctcc ctctttccaa tcggtaatta aataacaaaa ggaaaaaact 2640 taagatgcttcaaccccgtt tcgtgacact ttgaaaaaag aatcacctct tgcaaacacc 2700 cgctcccgacccccgccgct gaagcccggc gtccagaggc ctaagcgcgg gtgcccgccc 2760 ccacccgggagcgcgggcct cgtggtcagc gcatccgcgg ggagaaacaa aggccgcggc 2820 acgggggctcaagggcactg cgccacaccg cacgcgccta cccccgcgcg gccacgttaa 2880 ctggcggtcgccgcagcctc gggacagccg gccgcgcgcc gccaggctcg cggacgcggg 2940 accacgcgccgccctccggg aggcccaagt ctcgacccag ccccgcgtgg cgctggggga 3000 gggggcgcctccgccggaac gcgggtgggg gaggggaggg ggaaatgcgc tttgtctcga 3060 aatggggcaaccgtcgccac agctccctac cccctcgagg gcagagcagt ccccccacta 3120 actaccgggctggccgcgcg ccaggccagc cgcgaggcca ccgcccgacc ctccactcct 3180 tcccgcagctcccggcgcgg ggtccggcga gaaggggagg ggaggggagc ggagaaccgg 3240 gcccccgggacgcgtgtggc atctgaagca ccaccagcga gcgagagcta gagagaagga 3300 aagccaccgacttcaccgcc tccgagctgc tccgggtcgc gggtctgcag cgtctccggc 3360 cctccgcgcctacagctcaa gccacatccg aagggggagg gagccgggag ctgcgcgcgg 3420 ggccgccggggggaggggtg gcaccgccca cgccgggcgg ccacgaaggg cggggcagcg 3480 ggcgcgcgcgcggcgggggg aggggccggc gccgcgcccg ctgggaattg gggccctagg 3540 gggagggcggaggcgccgac gaccgcggca cttaccgttc gcggcgtggc gcccggtggt 3600 ccccaaggggagggaagggg gaggcggggc gaggacagtg accggagtct cctcagcggt 3660 ggcttttctgcttggcagcc tcagcggctg gcgccaaaac cggactccgc ccacttcctc 3720 gcccgccggtgcgagggtgt ggaatcctcc agacgctggg ggagggggag ttgggagctt 3780 aaaaactagtacccctttgg gaccactttc agcagcgaac tctcctgtac accaggggtc 3840 agttccacagacgcgggcca ggggtgggtc attgcggcgt gaacaataat ttgactagaa 3900 gttgattcgggtgtttccgg aaggggccga gtcaatccgc cgagttgggg cacggaaaac 3960 aaaaagggaaggctactaag atttttctgg cgggggttat cattggcgta actgcaggga 4020 ccacctcccgggttgagggg gctggatctc caggctgcgg attaagcccc tcccgtcggc 4080 gttaatttcaaactgcgcga cgtttctcac ctgccttcgc caaggcaggg gccgggaccc 4140 tattccaagaggtagtaact agcaggactc tagccttccg caattcattg agcgcattta 4200 cggaagtaacgtcgggtact gtctctggcc gcaagggtgg gaggagtacg catttggcgt 4260 aaggtggggcgtagagcctt cccgccattg gcggcggata gggcgtttac gcgacggcct 4320 gacgtagcggaagacgcgtt agtggggggg aaggttctag aaaagcggcg gcagcggctc 4380 tagcggcagtagcagcagcg ccgggtcccg tgcggaggtg ctcctcgcag agttgtttct 4440 cgagcagcggcagttctcac tacagcgcca ggacgagtcc ggttcgtgtt cgtccgcgga 4500 gatctctctcatctcgctcg gctgcgggaa atcgggctga agcgactgag tccgcgatgg 4560 aggtaacgggtttgaaatca atgagttatt gaaaagggca tggcgaggcc gttggcgcct 4620 cagtggaagtcggccagccg cctccgtggg agagaggcag gaaatcggac caattcagta 4680 gcagtggggcttaaggttta tgaacggggt cttgagcgga ggcctgagcg tacaaacagc 4740 ttccccaccctcagcctccc ggcgccattt cccttcactg ggggtggggg atggggagct 4800 ttcacatggcggacgctgcc ccgctggggt gaaagtgggg cgcggaggcg ggaattctta 4860 ttccctttctaaagcacgct gcttcggggg ccacggcgtc tcctcggcga gcgtttcggc 4920 gggcagcaggtcctcgtgag cgaggctgcg gagcttcccc tccccctctc tcccgggaac 4980 cgatttggcggccgccattt tcatggctcg ccttcctctc agcgttttcc ttataactct 5040 tttattttcttagtgtgctt tctctatcaa gaagtagaag tggttaacta tttttttttt 5100 cttctcgggctgttttcata tcgtttcgag gtggatttgg agtgttttgt gagcttggat 5160 ctttagagtcctgcgcacct cattaaaggc gctcagcctt cccctcgatg aaatggcgcc 5220 attgcgttcggaagccacac cgaagagcgg ggaggggggg tgctccgggt ttgcgggccc 5280 ggtttcagagaagatatcac cacccagggc gtcgggccgg gttcaatgcg agccgtagga 5340 caaagaaaccattttatgtt tttcctgtct tttttttcct ttgagtaacg gttttatctg 5400 ggtctgcagtcagtaaaacg acagatgaac cgcggcaaaa taaacataaa ttggaagcca 5460 tcggccacgaggggcaggga cgaaggtggt tttctgggcg ggggagggat attcgcgtca 5520 gaatcctttactgttcttaa ggattccgtt taagttgtag agctgactca ttttaagtaa 5580 tgttgttactgagaagttta acccttacgg gacagatcca tggaccttta tagatgatta 5640 cgaggaaagtgaaataacga ttttgtcctt agttatactt cgattaaaac atggcttcag 5700 aggctccttcctgtaatgcg tatggattga tgtgcaaaac tgttttgggc ctgggccgct 5760 ctgtatttgaactttgttac ttttctcatt ttgtttgcaa tcttggttga acattacatt 5820 gataagcataaggtctcaag cgaagggggt ctacctggtt atttttcttt gaccctaagc 5880 acgtttataaaataacattg tttaaaatcg atagtggaca tcgggtaagt ttggataaat 5940 tgtgaggtaagtaatgagtt tttgcttttt gttagtgatt tgtaaaactt gttataaatg 6000 tacattatccgtaatttcag tttagagata acctatgtgc tgacgacaat taagaataaa 6060 aactagctgaaaaaatgaaa ataactatcg tgacaagtaa ccatttcaaa agactgcttt 6120 gtgtctcataggagctagtt tgatcatttc agttaatttt ttctttaatt tttacgagtc 6180 atgaaaactacaggaaaaaa aatctgaact gggttttacc actacttttt aggagttggg 6240 agcatgcgaatggagggaga gctccgtaga actgggatga gagcagcaat taatgctgct 6300 tgctaggaacaaaaaataat tgattgaaaa ttacgtgtga ctttttagtt tgcattatgc 6360 gtttgtagcagttggtcctg gatatcactt tctctcgttt gaggtttttt aacctagtta 6420 acttttaagacaggtttcct taacattcat aagtgcccag aatacagctg tgtagtacag 6480 catataaagatttcagctct gaggtttttc ctattgactt ggaaaattgt tttgtgcctg 6540 tcgcttgccacatggccaat caagtaagct tcgaattcga gctcgcccaa ctccgcccgt 6600 tttatgactagaaccaatag tttttaatgc caaatgcact gaaatcccct aatttgcaaa 6660 gccaaacgccccctatgtga gtaatacggg gactttttac ccaatttccc aagcggaaag 6720 ccccctaatacactcatatg gcatatgaat cagcacggtc atgcactcta atggcggccc 6780 atagggactttccacatagg gggcgttcac catttcccag cataggggtg gtgactcaat 6840 ggcctttacccaagtacatt gggtcaatgg gaggtaagcc aatgggtttt tcccattact 6900 ggcaagcacactgagtcaaa tgggactttc cactgggttt tgcccaagta cattgggtca 6960 atgggaggtgagccaatggg aaaaacccat tgctgccaag tacactgact caatagggac 7020 tttccaatgggtttttccat tgttggcaag catataaggt caatgtgggt gagtcaatag 7080 ggactttccattgtattctg cccagtacat aaggtcaata gggggtgaat caacaggaaa 7140 gtcccattggagccaagtac actgcgtcaa tagggacttt ccattgggtt ttgcccagta 7200 cataaggtcaataggggatg agtcaatggg aaaaacccat tggagccaag tacactgact 7260 caatagggactttccattgg gttttgccca gtacataagg tcaatagggg gtgagtcaac 7320 aggaaagtcccattggagcc aagtacattg agtcaatagg gactttccaa tgggttttgc 7380 ccagtacataaggtcaatgg gaggtaagcc aatgggtttt tcccattact ggcacgtata 7440 ctgagtcattagggactttc caatgggttt tgcccagtac ataaggtcaa taggggtgaa 7500 tcaacaggaaagtcccattg gagccaagta cactgagtca atagggactt tccattgggt 7560 tttgcccagtacaaaaggtc aatagggggt gagtcaatgg gtttttccca ttattggcac 7620 gtacataaggtcaatagggg tgagtcattg ggtttttcca gccaatttaa ttaaaacgcc 7680 atgtactttcccaccattga cgtcaatggg ctattgaaac taatgcaacg tgacctttaa 7740 acggtactttcccatagctg attaatggga aagtaccgtt ctcgagccaa tacacgtcaa 7800 tgggaagtgaaagggcagcc aaaacgtaac accgccccgg ttttcccctg gaaattccat 7860 attggcacgcattctattgg ctgagctgcg ttctacgtgg gtataagagg cgcgaccagc 7920 gtcggtaccgtcgcagtctt cggtctgacc accgtagaac gcagagctcc tcgctgcagc 7980 ccgggtctagaggatccgcc tgagaaagga agtgagctgt aaaggctgag ctctctctct 8040 gacgtatgtagcctctggtt agcttcgtca ctcactgttc ttgactcagc atggcaatct 8100 gatgaaatcccagctgtaag tctgcagaaa ttgatgatct attaaacaat aaagatgtcc 8160 actaaaatggaagtttttcc tgtcatactt tgttaagaag ggtgagaaca gagtacctac 8220 attttgaatggaaggattgg agctacgggg gtgggggtgg ggtgggatta gataaatgcc 8280 tgctctttactgaaggctct ttactattgc tttatgataa tgtttcatag ttggatatca 8340 taatttaaacaagcaaaacc aaattaaggg ccagctcatt cctccagatc cactagtaat 8400 tctgtggaatgtgtgtcagt tagggtgtgg aaagtcccca ggctccccag caggcagaag 8460 tatgcaaagcatgcatctca attagtcagc aaccaggtgt ggaaagtccc caggctcccc 8520 agcaggcagaagtatgcaaa gcatgcatct caattagtca gcaaccatag tcccgcccct 8580 aactccgcccatcccgcccc taactccgcc cagttccgcc cattctccgc cccatggctg 8640 actaattttttttatttatg cagaggccga ggccgcctct gcctctgagc tattccagaa 8700 gtagtgaggaggcttttttg gaggcctagg cttttgcaaa aagctcccgg gagcttgtat 8760 atccattttcggatctgatc aagagacagg atgaggatcg tttcgcatga ttgaacaaga 8820 tggattgcacgcaggttctc cggccgcttg ggtggagagg ctattcggct atgactgggc 8880 acaacagacaatcggctgct ctgatgccgc cgtgttccgg ctgtcagcgc aggggcgccc 8940 ggttctttttgtcaagaccg acctgtccgg tgccctgaat gaactgcagg acgaggcagc 9000 gcggctatcstggctggcca cgacgggcgt tccttgcgca gctgtgctcg acgttgtcac 9060 tgaagcgggaagggactggc tgctattggg cgaagtgccg gggcaggatc tcctgtcatc 9120 tcaccttgctcctgccgaga aagtatccat catggctgat gcaatgcggc ggctgcatac 9180 gcttgatccggctacctgcc cattcgacca ccaagcgaaa catcgcatcg agcgagcacg 9240 tactcggatggaagccggtc ttgtcgatca ggatgatctg gacgaagagc atcaggggct 9300 cgcgccagccgaactgttcg ccaggctcaa ggcgcgcatg cccgacggcg aggatctcgt 9360 cgtgacccatggcgatgcct gcttgccgaa tatcatggtg gaaaatggcc gcttttctgg 9420 attcatcgactgtggccggc tgggtgtggc ggaccgctat caggacatag cgttggctac 9480 ccgtgatattgctgaagagc ttggcggcga atgggctgac cgcttcctcg tgctttacgg 9540 tatcgccgctcccgattcgc agcgcatcgc cttctatcgc cttcttgacg agttcttctg 9600 agcgggactctggggttcga aatgaccgac caagcgacgc ccaacctgcc atcacgagat 9660 ttcgattccaccgccgcctt ctatgaaagg ttgggcttcg gaatcgtttt ccgggacgcc 9720 ggctggatgatcctccagcg cggggatctc atgctggagt tcttcgccca ccccaacttg 9780 tttattgcagcttataatgg ttacaaataa agcaatagca tcacaaattt cacaaataaa 9840 gcatttttttcactgcattc tagttgtggt ttgtccaaac tcatcaatgt atcttatcat 9900 gtctgtataccgtcgagact agttctagag cggccgccac cgcggtggag ctccagcttt 9960 tgttccctttagtgagggtt aatttcgagc ttggcgtaat catggtcata gctgtttcct 10020 gtgtgaaattgttatccgct cacaattcca cacaacatac gagccggaag cataaagtgt 10080 aaagcctggggtgcctaatg agtgagctaa ctcacattaa ttgcgttgcg ctcactgccc 10140 gctttccagtcgggaaacct gtcgtgccag ggggtaccta ggccgggcaa caattggcgg 10200 ccggccgcacttttcgggga aatgtgcgcg gaacccctat ttgtttattt ttctaaatac 10260 attcaaatatgtatccgctc atgagacaat aaccctgata aatgcttcaa taatattgaa 10320 aaaggaagagtatgagtatt caacatttcc gtgtcgccct tattcccttt tttgcggcat 10380 tttgccttcctgtttttgct cacccagaaa cgctggtgaa agtaaaagat gctgaagatc 10440 agttgggtgcacgagtgggt tacatcgaac tggatctcaa cagcggtaag atccttgaga 10500 gttttcgccccgaagaacgt tttccaatga tgagcacttt taaagttctg ctatgtggcg 10560 cggtattatcccgtattgac gccgggcaag agcaactcgg tcgccgcata cactattctc 10620 agaatgacttggttgagtac tcaccagtca cagaaaagca tcttacggat ggcatgacag 10680 taagagaattatgcagtgct gccataacca tgagtgataa cactgcggcc aacttacttc 10740 tgacaacgatcggaggaccg aaggagctaa ccgctttttt gcacaacatg ggggatcatg 10800 taactcgccttgatcgttgg gaaccggagc tgaatgaagc cataccaaac gacgagcgtg 10860 acaccacgatgcctgtagca atggcaacaa cgttgcgcaa actattaact ggcgaactac 10920 ttactctagcttcccggcaa caattaatag actggatgga ggcggataaa gttgcaggac 10980 cacttctgcgctcggccctt ccggctggct ggtttattgc tgataaatct ggagccggtg 11040 agcgtgggtctcgcggtatc attgcagcac tggggccaga tggtaagccc tcccgtatcg 11100 tagttatctacacgacgggg agtcaggcaa ctatggatga acgaaataga cagatcgctg 11160 agataggtgcctcactgatt aagcattggt aactgtcaga ccctaggccg ggcaacaatt 11220 ggcggccggccctgcattaa tgaatcggcc aacgcgcggg gagaggcggt ttgcgtattg 11280 ggcgctcttccgcttcctcg ctcactgact cgctgcgctc ggtcgttcgg ctgcggcgag 11340 cggtatcagctcactcaaag gcggtaatac ggttatccac agaatcaggg gataacgcag 11400 gaaagaacatgtgagcaaaa ggccagcaaa aggccaggaa ccgtaaaaag gccgcgttgc 11460 tggcgtttttccataggctc cgcccccctg acgagcatca caaaaatcga cgctcaagtc 11520 agaggtggcgaaacccgaca ggactataaa gataccaggc gtttccccct ggaagctccc 11580 tcgtgcgctctcctgttccg accctgccgc ttaccggata cctgtccgcc tttctccctt 11640 cgggaagcgtggcgctttct catagctcac gctgtaggta tctcagttcg gtgtaggtcg 11700 ttcgctccaagctgggctgt gtgcacgaac cccccgttca gcccgaccgc tgcgccttat 11760 ccggtaactatcgtcttgag tccaacccgg taagacacga cttatcgcca ctggcagcag 11820 ccactggtaacaggattagc agagcgaggt atgtaggcgg tgctacagag ttcttgaagt 11880 ggtggcctaactacggctac actagaagga cagtatttgg tatctgcgct ctgctgaagc 11940 cagttaccttcggaaaaaga gttggtagct cttgatccgg caaacaaacc accgctggta 12000 gcggtggtttttttgtttgc aagcagcaga ttacgcgcag aaaaaaagga tctcaagaag 12060 atcctttgatcttttctacg gggtctgacg ctcagtggaa cgaaaactc 12109 3 12680 DNA ArtificialSequence Artificial Sequence containing human UCOE elements and vectorsequence 3 acgttgtaaa acgacggcca gtgaattgta atacgactca ctatagggcgaattgggtac 60 cgggcccccc ctcgaggtcg agttggggtg gggaaaagga agaaacgcgggcgtattggc 120 cccaatgggg tctcggtggg gtatcgacag agtgccagcc ctgggaccgaaccccgcgtt 180 tatgaacaaa cgacccaaca cccgtgcgtt ttattctgtc tttttattgccgtcatagcg 240 cgggttcctt ccggtattgt ctccttccgt cgacgatctg acggttcactaaaccagctc 300 tgcttatata gacctcccac cgtacacgcc taccgcccat ttgcgtcaatggggcggagt 360 tgttacgaca ttttggaaag tcccgttgat tttggtgcca aaacaaactcccattgacgt 420 caatggggtg gagacttgga aatccccgtg agtcaaaccg ctatccacgcccattgatgt 480 actgccaaaa ccgcatcacc atggtaatag cgatgactaa tacgtagatgtactgccaag 540 taggaaagtc ccataaggtc atgtactggg cataatgcca ggcgggccatttaccgtcat 600 tgacgtcaat agggggcgta cttggcatat gatacacttg atgtactgccaagtgggcag 660 tttaccgtaa atactccacc cattgacgtc aatggaaagt ccctattggcgttactatgg 720 gaacatacgt cattattgac gtcaatgggc gggggtcgtt gggcggtcagccaggcgggc 780 catttaccgt aagttatgta acgcggaact ccatatatgg gctatgaactaatgaccccg 840 taattgatta ctattaataa ctcgacggta tcatggtggc gaccggcatggtgagctgcg 900 agaatagccg ggcgcgctgt gagccgaagt cgcccccgcc ctggccacttccggcgcgcc 960 gagtccttag gccgccaggg ggcgccggcg cgcgcccaga ttggggacaaaggaagccgg 1020 gccggccgcg ttattaccat aaaaggcaaa cactggtcgg aggcgtccccgcggcgcgcg 1080 gcaggaagcc aggccccaac cccctcccaa ccgggcgcca gccccgcctccgcccggttc 1140 aaacagcgac cgggtcgcgc gcgcgcacgc agcggccaca ccctcgggcgccagcggctc 1200 gggcaggaag tggcgcaagc gcccgggccc cagaacgcac gcgcgattagcgccattgag 1260 tcccagcgcg cacgcgcaat tagcgccaat tcccagcgcg cacgcagttagcgcccaaag 1320 gaccagcgcg cacgcgcatg gcgccccagc ccccaccggg cctgacgggggctacgccgc 1380 gcccaccgtg cgatccccat tggcaagagc ccggctcaga caaagaccccgccggttgcc 1440 cccgccccga gagcggcacc cccggagcgc gcccgcccga gcgcggcctcgcgcctgcga 1500 actggcgtgg ggtgtccccc atctccggag gcccaggggc ttctcccgcgccccccacgg 1560 cggtccggtt ccgccccatg cgccccccgc tgcggcccag acggcggctctgcacgggcg 1620 aagggccgcg gccgcatgcc ccggtcggct ggccgggctt acctggcggcgggtgtggac 1680 gggcggcgga tcggcaaagg cgaggctctg tgctcgcggg cggacgcggtctcggcggtg 1740 gtggcgcgtc gcgccgctgg gttttatagg gcgccgccgc ggccgctcgagccataaaag 1800 gcaactttcg gaacggcgca cgctgattgg ccccgcgccg ctcactcaccggcttcgccg 1860 cacagtgcag cattttttta ccccctctcc cctccttttg cgaaaaaaaaaaagagcgag 1920 agcgagattg aggaagagga ggagggagag ttttggcgtt ggccgccttggggtgctggg 1980 cccgggggct gggggcgcgc gccgtggccc ccgcgcccca cgctgggcagtgcccggttc 2040 ggccccgcat ggccaggcct gcccccggcc tgcccgtctc tcgggccccccacccaccgc 2100 gggacatcct aggtgtggac atctcttggg cactgagcgc ccaggtggggtgggccaggg 2160 tctgcacggg tgccagggcc ctgggttctg tacgctcctg cagaaggagctcttggaggg 2220 catggagtgg ccaggcagtc actccccctt gccgacttca gagcaactgccctgaaagca 2280 gggcctgagg acctctggct gtggggctca gctagctaaa tgtgctgggtgggtcactag 2340 ggagagacct gggcttgaga ggtagagtgt ggtgttgggg gagtcaggtggcttgcggcc 2400 attagagtcg caggaccaca ctccccagga cagggcaggg gccagcggtccagtggctgg 2460 aggtggcccg tgatgaaggc tacaaaccta cccagccgca gccctgggaaggaagtgggc 2520 tctacagggc agggcacctt ttaccctgga gctgcctgct tttgagggtaacagtcacgc 2580 ccagccaaga ccaggcctgg ggcgttagtg ggtgacctag gcactgcggggcgggggggc 2640 tgggtctaca cagcctgggt ctgggcccac cgtccgttgt atgtctgctatgcgcagcca 2700 cagctgaact gccctcccag accatctgga ggccgctggg ggactctggggaccaagact 2760 ccatgtgcca cagaggattg ggggcggggc ggtgctagga actcaaagccagcctgggaa 2820 gaccctgtcc ttgtcaccct ttcttgcctt gggtctgtcc actgagtagcacacaagacc 2880 gggtgggcag ggtccgttct gctccgggaa tcacagactg tgtgtacccaggtggtgggc 2940 atgcagcgat cagtggcgtg ggaccacaga gggggcccgc ggtacctaaaacagcttcac 3000 atggcttaaa ataggggacc aatgtctttt ccaatctaag tcccatttataataaagtcc 3060 atgttccatt tttaaaggac aatcctttcg gtttaaaacc aggcacgattacccaaacaa 3120 ctcacaacgg taaagcactg tgaatcttct ctgttctgca atcccaacttggtttctgct 3180 cagaaaccct ccctctttcc aatcggtaat taaataacaa aaggaaaaaacttaagatgc 3240 ttcaaccccg tttcgtgaca ctttgaaaaa agaatcacct cttgcaaacacccgctcccg 3300 acccccgccg ctgaagcccg gcgtccagag gcctaagcgc gggtgcccgcccccacccgg 3360 gagcgcgggc ctcgtggtca gcgcatccgc ggggagaaac aaaggccgcggcacgggggc 3420 tcaagggcac tgcgccacac cgcacgcgcc tacccccgcg cggccacgttaactggcggt 3480 cgccgcagcc tcgggacagc cggccgcgcg ccgccaggct cgcggacgcgggaccacgcg 3540 ccgccctccg ggaggcccaa gtctcgaccc agccccgcgt ggcgctgggggagggggcgc 3600 ctccgccgga acgcgggtgg gggaggggag ggggaaatgc gctttgtctcgaaatggggc 3660 aaccgtcgcc acagctccct accccctcga gggcagagca gtccccccactaactaccgg 3720 gctggccgcg cgccaggcca gccgcgaggc caccgcccga ccctccactccttcccgcag 3780 ctcccggcgc ggggtccggc gagaagggga ggggagggga gcggagaaccgggcccccgg 3840 gacgcgtgtg gcatctgaag caccaccagc gagcgagagc tagagagaaggaaagccacc 3900 gacttcaccg cctccgagct gctccgggtc gcgggtctgc agcgtctccggccctccgcg 3960 cctacagctc aagccacatc cgaaggggga gggagccggg agctgcgcgcggggccgccg 4020 gggggagggg tggcaccgcc cacgccgggc ggccacgaag ggcggggcagcgggcgcgcg 4080 cgcggcgggg ggaggggccg gcgccgcgcc cgctgggaat tggggccctagggggagggc 4140 ggaggcgccg acgaccgcgg cacttaccgt tcgcggcgtg gcgcccggtggtccccaagg 4200 ggagggaagg gggaggcggg gcgaggacag tgaccggagt ctcctcagcggtggcttttc 4260 tgcttggcag cctcagcggc tggcgccaaa accggactcc gcccacttcctcgcccgccg 4320 gtgcgagggt gtggaatcct ccagacgctg ggggaggggg agttgggagcttaaaaacta 4380 gtaccccttt gggaccactt tcagcagcga actctcctgt acaccaggggtcagttccac 4440 agacgcgggc caggggtggg tcattgcggc gtgaacaata atttgactagaagttgattc 4500 gggtgtttcc ggaaggggcc gagtcaatcc gccgagttgg ggcacggaaaacaaaaaggg 4560 aaggctacta agatttttct ggcgggggtt atcattggcg taactgcagggaccacctcc 4620 cgggttgagg gggctggatc tccaggctgc ggattaagcc cctcccgtcggcgttaattt 4680 caaactgcgc gacgtttctc acctgccttc gccaaggcag gggccgggaccctattccaa 4740 gaggtagtaa ctagcaggac tctagccttc cgcaattcat tgagcgcatttacggaagta 4800 acgtcgggta ctgtctctgg ccgcaagggt gggaggagta cgcatttggcgtaaggtggg 4860 gcgtagagcc ttcccgccat tggcggcgga tagggcgttt acgcgacggcctgacgtagc 4920 ggaagacgcg ttagtggggg ggaaggttct agaaaagcgg cggcagcggctctagcggca 4980 gtagcagcag cgccgggtcc cgtgcggagg tgctcctcgc agagttgtttctcgagcagc 5040 ggcagttctc actacagcgc caggacgagt ccggttcgtg ttcgtccgcggagatctctc 5100 tcatctcgct cggctgcggg aaatcgggct gaagcgactg agtccgcgatggaggtaacg 5160 ggtttgaaat caatgagtta ttgaaaaggg catggcgagg ccgttggcgcctcagtggaa 5220 gtcggccagc cgcctccgtg ggagagaggc aggaaatcgg accaattcagtagcagtggg 5280 gcttaaggtt tatgaacggg gtcttgagcg gaggcctgag cgtacaaacagcttccccac 5340 cctcagcctc ccggcgccat ttcccttcac tgggggtggg ggatggggagctttcacatg 5400 gcggacgctg ccccgctggg gtgaaagtgg ggcgcggagg cgggaattcttattcccttt 5460 ctaaagcacg ctgcttcggg ggccacggcg tctcctcggc gagcgtttcggcgggcagca 5520 ggtcctcgtg agcgaggctg cggagcttcc cctccccctc tctcccgggaaccgatttgg 5580 cggccgccat tttcatggct cgccttcctc tcagcgtttt ccttataactcttttatttt 5640 cttagtgtgc tttctctatc aagaagtaga agtggttaac tattttttttttcttctcgg 5700 gctgttttca tatcgtttcg aggtggattt ggagtgtttt gtgagcttggatctttagag 5760 tcctgcgcac ctcattaaag gcgctcagcc ttcccctcga tgaaatggcgccattgcgtt 5820 cggaagccac accgaagagc ggggaggggg ggtgctccgg gtttgcgggcccggtttcag 5880 agaagatatc accacccagg gcgtcgggcc gggttcaatg cgagccgtaggacaaagaaa 5940 ccattttatg tttttcctgt cttttttttc ctttgagtaa cggttttatctgggtctgca 6000 gtcagtaaaa cgacagatga accgcggcaa aataaacata aattggaagccatcggccac 6060 gaggggcagg gacgaaggtg gttttctggg cgggggaggg atattcgcgtcagaatcctt 6120 tactgttctt aaggattccg tttaagttgt agagctgact cattttaagtaatgttgtta 6180 ctgagaagtt taacccttac gggacagatc catggacctt tatagatgattacgaggaaa 6240 gtgaaataac gattttgtcc ttagttatac ttcgattaaa acatggcttcagaggctcct 6300 tcctgtaatg cgtatggatt gatgtgcaaa actgttttgg gcctgggccgctctgtattt 6360 gaactttgtt acttttctca ttttgtttgc aatcttggtt gaacattacattgataagca 6420 taaggtctca agcgaagggg gtctacctgg ttatttttct ttgaccctaagcacgtttat 6480 aaaataacat tgtttaaaat cgatagtgga catcgggtaa gtttggataaattgtgaggt 6540 aagtaatgag tttttgcttt ttgttagtga tttgtaaaac ttgttataaatgtacattat 6600 ccgtaatttc agtttagaga taacctatgt gctgacgaca attaagaataaaaactagct 6660 gaaaaaatga aaataactat cgtgacaagt aaccatttca aaagactgctttgtgtctca 6720 taggagctag tttgatcatt tcagttaatt ttttctttaa tttttacgagtcatgaaaac 6780 tacaggaaaa aaaatctgaa ctgggtttta ccactacttt ttaggagttgggagcatgcg 6840 aatggaggga gagctccgta gaactgggat gagagcagca attaatgctgcttgctagga 6900 acaaaaaata attgattgaa aattacgtgt gactttttag tttgcattatgcgtttgtag 6960 cagttggtcc tggatatcac tttctctcgt ttgaggtttt ttaacctagttaacttttaa 7020 gacaggtttc cttaacattc ataagtgccc agaatacagc tgtgtagtacagcatataaa 7080 gatttcagct ctgaggtttt tcctattgac ttggaaaatt gttttgtgcctgtcgcttgc 7140 cacatggcca atcaagtaag cttcgaattc gagctcgccc aactccgcccgttttatgac 7200 tagaaccaat agtttttaat gccaaatgca ctgaaatccc ctaatttgcaaagccaaacg 7260 ccccctatgt gagtaatacg gggacttttt acccaatttc ccaagcggaaagccccctaa 7320 tacactcata tggcatatga atcagcacgg tcatgcactc taatggcggcccatagggac 7380 tttccacata gggggcgttc accatttccc agcatagggg tggtgactcaatggccttta 7440 cccaagtaca ttgggtcaat gggaggtaag ccaatgggtt tttcccattactggcaagca 7500 cactgagtca aatgggactt tccactgggt tttgcccaag tacattgggtcaatgggagg 7560 tgagccaatg ggaaaaaccc attgctgcca agtacactga ctcaatagggactttccaat 7620 gggtttttcc attgttggca agcatataag gtcaatgtgg gtgagtcaatagggactttc 7680 cattgtattc tgcccagtac ataaggtcaa tagggggtga atcaacaggaaagtcccatt 7740 ggagccaagt acactgcgtc aatagggact ttccattggg ttttgcccagtacataaggt 7800 caatagggga tgagtcaatg ggaaaaaccc attggagcca agtacactgactcaataggg 7860 actttccatt gggttttgcc cagtacataa ggtcaatagg gggtgagtcaacaggaaagt 7920 cccattggag ccaagtacat tgagtcaata gggactttcc aatgggttttgcccagtaca 7980 taaggtcaat gggaggtaag ccaatgggtt tttcccatta ctggcacgtatactgagtca 8040 ttagggactt tccaatgggt tttgcccagt acataaggtc aataggggtgaatcaacagg 8100 aaagtcccat tggagccaag tacactgagt caatagggac tttccattgggttttgccca 8160 gtacaaaagg tcaatagggg gtgagtcaat gggtttttcc cattattggcacgtacataa 8220 ggtcaatagg ggtgagtcat tgggtttttc cagccaattt aattaaaacgccatgtactt 8280 tcccaccatt gacgtcaatg ggctattgaa actaatgcaa cgtgacctttaaacggtact 8340 ttcccatagc tgattaatgg gaaagtaccg ttctcgagcc aatacacgtcaatgggaagt 8400 gaaagggcag ccaaaacgta acaccgcccc ggttttcccc tggaaattccatattggcac 8460 gcattctatt ggctgagctg cgttctacgt gggtataaga ggcgcgaccagcgtcggtac 8520 cgtcgcagtc ttcggtctga ccaccgtaga acgcagagct cctcgctgcagcccgggtct 8580 agaggatccg cctgagaaag gaagtgagct gtaaaggctg agctctctctctgacgtatg 8640 tagcctctgg ttagcttcgt cactcactgt tcttgactca gcatggcaatctgatgaaat 8700 cccagctgta agtctgcaga aattgatgat ctattaaaca ataaagatgtccactaaaat 8760 ggaagttttt cctgtcatac tttgttaaga agggtgagaa cagagtacctacattttgaa 8820 tggaaggatt ggagctacgg gggtgggggt ggggtgggat tagataaatgcctgctcttt 8880 actgaaggct ctttactatt gctttatgat aatgtttcat agttggatatcataatttaa 8940 acaagcaaaa ccaaattaag ggccagctca ttcctccaga tccactagttctagagcaaa 9000 ttctaccggg taggggaggc gcttttccca aggcagtctg gagcatgcgctttagcagcc 9060 ccgctgggca cttggcgcta cacaagtggc ctctggcctc gcacacattccacatccacc 9120 ggtaggcgcc aaccggctcc gttctttggt ggccccttcg cgccaccttctactcctccc 9180 ctagtcagga agttcccccc cgccccgcag ctcgcgtcgt gcaggacgtgacaaatggaa 9240 gtagcacgtc tcactagtct cgtgcagatg gacagcaccg ctgagcaatggaagcgggta 9300 ggcctttggg gcagcggcca atagcagctt tgctccttcg ctttctgggctcagaggctg 9360 ggaaggggtg ggtccggggg cgggctcagg ggcgggctca ggggcggggcgggcgcccga 9420 aggtcctccg gaggcccggc attctgcacg cttcaaaagc gcacgtctgccgcgctgttc 9480 tcctcttcct catctccggg cctttcgacc agcttaccat gaccgagtacaagcccacgg 9540 tgcgcctcgc cacccgcgac gacgtcccca gggccgtacg caccctcgccgccgcgttcg 9600 ccgactaccc cgccacgcgc cacaccgtcg atccggaccg ccacatcgagcgggtcaccg 9660 agctgcaaga actcttcctc acgcgcgtcg ggctcgacat cggcaaggtgtgggtcgcgg 9720 acgacggcgc cgcggtggcg gtctggacca cgccggagag cgtcgaagcgggggcggtgt 9780 tcgccgagat cggcccgcgc atggccgagt tgagcggttc ccggctggccgcgcagcaac 9840 agatggaagg cctcctggcg ccgcaccggc ccaaggagcc cgcgtggttcctggccaccg 9900 tcggcgtctc gcccgaccac cagggcaagg gtctgggcag cgccgtcgtgctccccggag 9960 tggaggcggc cgagcgcgcc ggggtgcccg ccttcctgga gacctccgcgccccgcaacc 10020 tccccttcta cgagcggctc ggcttcaccg tcaccgccga cgtcgaggtgcccgaaggac 10080 cgcgcacctg gtgcatgacc cgcaagcccg gtgcctgacg cccgccccacgacccgcagc 10140 gcccgaccga aaggagcgca cgaccccatg catcgtagag ctcgctgatcagcctcgact 10200 gtgccttcta gttgccagcc atctgttgtt tgcccctccc ccgtgccttccttgaccctg 10260 gaaggtgcca ctcccactgt cctttcctaa taaaatgagg aaattgcatcgcattgtctg 10320 agtaggtgtc attctattct ggggggtggg gtggggcagg acagcaaggggggggattgg 10380 gragacaata gcaggcatgc tgggggggcg gtgggggcta tggcttctgaggcggaaaga 10440 accagctggg gctcgagatc cactagttct agcctcgagg ctagagcggcctgctctaga 10500 gcggccgcca ccgcggtgga gctccagctt ttgttccctt tagtgagggttaatttcgag 10560 cttggcgtaa tcatggtcat agctgtttcc tgtgtgaaat tgttatccgctcacaattcc 10620 acacaacata cgagccggaa gcataaagtg taaagcctgg ggtgcctaatgagtgagcta 10680 actcacatta attgcgttgc gctcactgcc cgctttccag tcgggaaacctgtcgtgcca 10740 gggggtacct aggccgggca acaattggcg gccggccgca cttttcggggaaatgtgcgc 10800 ggaaccccta tttgtttatt tttctaaata cattcaaata tgtatccgctcatgagacaa 10860 taaccctgat aaatgcttca ataatattga aaaaggaaga gtatgagtattcaacatttc 10920 cgtgtcgccc ttattccctt ttttgcggca ttttgccttc ctgtttttgctcacccagaa 10980 acgctggtga aagtaaaaga tgctgaagat cagttgggtg cacgagtgggttacatcgaa 11040 ctggatctca acagcggtaa gatccttgag agttttcgcc ccgaagaacgttttccaatg 11100 atgagcactt ttaaagttct gctatgtggc gcggtattat cccgtattgacgccgggcaa 11160 gagcaactcg gtcgccgcat acactattct cagaatgact tggttgagtactcaccagtc 11220 acagaaaagc atcttacgga tggcatgaca gtaagagaat tatgcagtgctgccataacc 11280 atgagtgata acactgcggc caacttactt ctgacaacga tcggaggaccgaaggagcta 11340 accgcttttt tgcacaacat gggggatcat gtaactcgcc ttgatcgttgggaaccggag 11400 ctgaatgaag ccataccaaa cgacgagcgt gacaccacga tgcctgtagcaatggcaaca 11460 acgttgcgca aactattaac tggcgaacta cttactctag cttcccggcaacaattaata 11520 gactggatgg aggcggataa agttgcagga ccacttctgc gctcggcccttccggctggc 11580 tggtttattg ctgataaatc tggagccggt gagcgtgggt ctcgcggtatcattgcagca 11640 ctggggccag atggtaagcc ctcccgtatc gtagttatct acacgacggggagtcaggca 11700 actatggatg aacgaaatag acagatcgct gagataggtg cctcactgattaagcattgg 11760 taactgtcag accctaggcc gggcaacaat tggcggccgg ccctgcattaatgaatcggc 11820 caacgcgcgg ggagaggcgg tttgcgtatt gggcgctctt ccgcttcctcgctcactgac 11880 tcgctgcgct cggtcgttcg gctgcggcga gcggtatcag ctcactcaaaggcggtaata 11940 cggttatcca cagaatcagg ggataacgca ggaaagaaca tgtgagcaaaaggccagcaa 12000 aaggccagga accgtaaaaa ggccgcgttg ctggcgtttt tccataggctccgcccccct 12060 gacgagcatc acaaaaatcg acgctcaagt cagaggtggc gaaacccgacaggactataa 12120 agataccagg cgtttccccc tggaagctcc ctcgtgcgct ctcctgttccgaccctgccg 12180 cttaccggat acctgtccgc ctttctccct tcgggaagcg tggcgctttctcatagctca 12240 cgctgtaggt atctcagttc ggtgtaggtc gttcgctcca agctgggctgtgtgcacgaa 12300 ccccccgttc agcccgaccg ctgcgcctta tccggtaact atcgtcttgagtccaacccg 12360 gtaagacacg acttatcgcc actggcagca gccactggta acaggattagcagagcgagg 12420 tatgtaggcg gtgctacaga gttcttgaag tggtggccta actacggctacactagaagg 12480 acagtatttg gtatctgcgc tctgctgaag ccagttacct tcggaaaaagagttggtagc 12540 tcttgatccg gcaaacaaac caccgctggt agcggtggtt tttttgtttgcaagcagcag 12600 attacgcgca gaaaaaaagg atctcaagaa gatcctttga tcttttctacggggtctgac 12660 gctcagtgga acgaaaactc 12680 4 12088 DNA ArtificialSequence Artificial Sequence containing human UCOE elements and vectorsequence 4 acgttgtaaa acgacggcca gtgaattgta atacgactca ctatagggcgaattgggtac 60 cgggcccccc ctcgaggtcg agttggggtg gggaaaagga agaaacgcgggcgtattggc 120 cccaatgggg tctcggtggg gtatcgacag agtgccagcc ctgggaccgaaccccgcgtt 180 tatgaacaaa cgacccaaca cccgtgcgtt ttattctgtc tttttattgccgtcatagcg 240 cgggttcctt ccggtattgt ctccttccgt cgacggtatc aaggtggcgaccggaatggt 300 gagctgcgag aatagccggg cgcgctgtga gccgaagtcg cccccgccctggccacttcc 360 ggcgcgccga gtccttaggc cgccaggggg cgccggcgcg cgcccagattggggacaaag 420 gaagccgggc cggccgcgtt attaccataa aaggcaaaca ctggtcggaggcgtccccgc 480 ggcgcgcggc aggaagccag gccccaaccc cctcccaacc gggcgccagccccgcctccg 540 cccggttcaa acagcgaccg ggtcgcgcgc gcgcacgcag cggccacaccctcgggcgcc 600 agcggctcgg gcaggaagtg gcgcaagcgc ccgggcccca gaacgcacgcgcgattagcg 660 ccattgagtc ccagcgcgca cgcgcaatta gcgccaattc ccagcgcgcacgcagttagc 720 gcccaaagga ccagcgcgca cgcgcatggc gccccagccc ccaccgggcctgacgggggc 780 tacgccgcgc ccaccgtgcg atccccattg gcaagagccc ggctcagacaaagaccccgc 840 cggttgcccc cgccccgaga gcggcacccc cggagcgcgc ccgcccgagcgcggcctcgc 900 gcctgcgaac tggcgtgggg tgtcccccat ctccggaggc ccaggggcttctcccgcgcc 960 ccccacggcg gtccggttcc gccccatgcg ccccccgctg cggcccagacggcggctctg 1020 cacgggcgaa gggccgcggc cgcatgcccc ggtcggctgg ccgggcttacctggcggcgg 1080 gtgtggacgg gcggcggatc ggcaaaggcg aggctctgtg ctcgcgggcggacgcggtct 1140 cggcggtggt ggcgcgtcgc gccgctgggt tttatagggc gccgccgcggccgctcgagc 1200 cataaaaggc aactttcgga acggcgcacg ctgattggcc ccgcgccgctcactcaccgg 1260 cttcgccgca cagtgcagca tttttttacc ccctctcccc tccttttgcgaaaaaaaaaa 1320 agagcgagag cgagattgag gaagaggagg agggagagtt ttggcgttggccgccttggg 1380 gtgctgggcc cgggggctgg gggcgcgcgc cgtggccccc gcgccccacgctgggcagtg 1440 cccggttcgg ccccgcatgg ccaggcctgc ccccggcctg cccgtctctcgggcccccca 1500 cccaccgcgg gacatcctag gtgtggacat ctcttgggca ctgagcgcccaggtggggtg 1560 ggccagggtc tgcacgggtg ccagggccct gggttctgta cgctcctgcagaaggagctc 1620 ttggagggca tggagtggcc aggcagtcac tcccccttgc cgacttcagagcaactgccc 1680 tgaaagcagg gcctgaggac ctctggctgt ggggctcagc tagctaaatgtgctgggtgg 1740 gtcactaggg agagacctgg gcttgagagg tagagtgtgg tgttgggggagtcaggtggc 1800 ttgcggccat tagagtcgca ggaccacact ccccaggaca gggcaggggccagcggtcca 1860 gtggctggag gtggcccgtg atgaaggcta caaacctacc cagccgcagccctgggaagg 1920 aagtgggctc tacagggcag ggcacctttt accctggagc tgcctgcttttgagggtaac 1980 agtcacgccc agccaagacc aggcctgggg cgttagtggg tgacctaggcactgcggggc 2040 gggggggctg ggtctacaca gcctgggtct gggcccaccg tccgttgtatgtctgctatg 2100 cgcagccaca gctgaactgc cctcccagac catctggagg ccgctgggggactctgggga 2160 ccaagactcc atgtgccaca gaggattggg ggcggggcgg tgctaggaactcaaagccag 2220 cctgggaaga ccctgtcctt gtcacccttt cttgccttgg gtctgtccactgagtagcac 2280 acaagaccgg gtgggcaggg tccgttctgc tccgggaatc acagactgtgtgtacccagg 2340 tggtgggcat gcagcgatca gtggcgtggg accacagagg gggcccgcggtacctaaaac 2400 agcttcacat ggcttaaaat aggggaccaa tgtcttttcc aatctaagtcccatttataa 2460 taaagtccat gttccatttt taaaggacaa tcctttcggt ttaaaaccaggcacgattac 2520 ccaaacaact cacaacggta aagcactgtg aatcttctct gttctgcaatcccaacttgg 2580 tttctgctca gaaaccctcc ctctttccaa tcggtaatta aataacaaaaggaaaaaact 2640 taagatgctt caaccccgtt tcgtgacact ttgaaaaaag aatcacctcttgcaaacacc 2700 cgctcccgac ccccgccgct gaagcccggc gtccagaggc ctaagcgcgggtgcccgccc 2760 ccacccggga gcgcgggcct cgtggtcagc gcatccgcgg ggagaaacaaaggccgcggc 2820 acgggggctc aagggcactg cgccacaccg cacgcgccta cccccgcgcggccacgttaa 2880 ctggcggtcg ccgcagcctc gggacagccg gccgcgcgcc gccaggctcgcggacgcggg 2940 accacgcgcc gccctccggg aggcccaagt ctcgacccag ccccgcgtggcgctggggga 3000 gggggcgcct ccgccggaac gcgggtgggg gaggggaggg ggaaatgcgctttgtctcga 3060 aatggggcaa ccgtcgccac agctccctac cccctcgagg gcagagcagtccccccacta 3120 actaccgggc tggccgcgcg ccaggccagc cgcgaggcca ccgcccgaccctccactcct 3180 tcccgcagct cccggcgcgg ggtccggcga gaaggggagg ggaggggagcggagaaccgg 3240 gcccccggga cgcgtgtggc atctgaagca ccaccagcga gcgagagctagagagaagga 3300 aagccaccga cttcaccgcc tccgagctgc tccgggtcgc gggtctgcagcgtctccggc 3360 cctccgcgcc tacagctcaa gccacatccg aagggggagg gagccgggagctgcgcgcgg 3420 ggccgccggg gggaggggtg gcaccgccca cgccgggcgg ccacgaagggcggggcagcg 3480 ggcgcgcgcg cggcgggggg aggggccggc gccgcgcccg ctgggaattggggccctagg 3540 gggagggcgg aggcgccgac gaccgcggca cttaccgttc gcggcgtggcgcccggtggt 3600 ccccaagggg agggaagggg gaggcggggc gaggacagtg accggagtctcctcagcggt 3660 ggcttttctg cttggcagcc tcagcggctg gcgccaaaac cggactccgcccacttcctc 3720 gcccgccggt gcgagggtgt ggaatcctcc agacgctggg ggagggggagttgggagctt 3780 aaaaactagt acccctttgg gaccactttc agcagcgaac tctcctgtacaccaggggtc 3840 agttccacag acgcgggcca ggggtgggtc attgcggcgt gaacaataatttgactagaa 3900 gttgattcgg gtgtttccgg aaggggccga gtcaatccgc cgagttggggcacggaaaac 3960 aaaaagggaa ggctactaag atttttctgg cgggggttat cattggcgtaactgcaggga 4020 ccacctcccg ggttgagggg gctggatctc caggctgcgg attaagcccctcccgtcggc 4080 gttaatttca aactgcgcga cgtttctcac ctgccttcgc caaggcaggggccgggaccc 4140 tattccaaga ggtagtaact agcaggactc tagccttccg caattcattgagcgcattta 4200 cggaagtaac gtcgggtact gtctctggcc gcaagggtgg gaggagtacgcatttggcgt 4260 aaggtggggc gtagagcctt cccgccattg gcggcggata gggcgtttacgcgacggcct 4320 gacgtagcgg aagacgcgtt agtggggggg aaggttctag aaaagcggcggcagcggctc 4380 tagcggcagt agcagcagcg ccgggtcccg tgcggaggtg ctcctcgcagagttgtttct 4440 cgagcagcgg cagttctcac tacagcgcca ggacgagtcc ggttcgtgttcgtccgcgga 4500 gatctctctc atctcgctcg gctgcgggaa atcgggctga agcgactgagtccgcgatgg 4560 aggtaacggg tttgaaatca atgagttatt gaaaagggca tggcgaggccgttggcgcct 4620 cagtggaagt cggccagccg cctccgtggg agagaggcag gaaatcggaccaattcagta 4680 gcagtggggc ttaaggttta tgaacggggt cttgagcgga ggcctgagcgtacaaacagc 4740 ttccccaccc tcagcctccc ggcgccattt cccttcactg ggggtgggggatggggagct 4800 ttcacatggc ggacgctgcc ccgctggggt gaaagtgggg cgcggaggcgggaattctta 4860 ttccctttct aaagcacgct gcttcggggg ccacggcgtc tcctcggcgagcgtttcggc 4920 gggcagcagg tcctcgtgag cgaggctgcg gagcttcccc tccccctctctcccgggaac 4980 cgatttggcg gccgccattt tcatggctcg ccttcctctc agcgttttccttataactct 5040 tttattttct tagtgtgctt tctctatcaa gaagtagaag tggttaactatttttttttt 5100 cttctcgggc tgttttcata tcgtttcgag gtggatttgg agtgttttgtgagcttggat 5160 ctttagagtc ctgcgcacct cattaaaggc gctcagcctt cccctcgatgaaatggcgcc 5220 attgcgttcg gaagccacac cgaagagcgg ggaggggggg tgctccgggtttgcgggccc 5280 ggtttcagag aagatatcac cacccagggc gtcgggccgg gttcaatgcgagccgtagga 5340 caaagaaacc attttatgtt tttcctgtct tttttttcct ttgagtaacggttttatctg 5400 ggtctgcagt cagtaaaacg acagatgaac cgcggcaaaa taaacataaattggaagcca 5460 tcggccacga ggggcaggga cgaaggtggt tttctgggcg ggggagggatattcgcgtca 5520 gaatccttta ctgttcttaa ggattccgtt taagttgtag agctgactcattttaagtaa 5580 tgttgttact gagaagttta acccttacgg gacagatcca tggacctttatagatgatta 5640 cgaggaaagt gaaataacga ttttgtcctt agttatactt cgattaaaacatggcttcag 5700 aggctccttc ctgtaatgcg tatggattga tgtgcaaaac tgttttgggcctgggccgct 5760 ctgtatttga actttgttac ttttctcatt ttgtttgcaa tcttggttgaacattacatt 5820 gataagcata aggtctcaag cgaagggggt ctacctggtt atttttctttgaccctaagc 5880 acgtttataa aataacattg tttaaaatcg atagtggaca tcgggtaagtttggataaat 5940 tgtgaggtaa gtaatgagtt tttgcttttt gttagtgatt tgtaaaacttgttataaatg 6000 tacattatcc gtaatttcag tttagagata acctatgtgc tgacgacaattaagaataaa 6060 aactagctga aaaaatgaaa ataactatcg tgacaagtaa ccatttcaaaagactgcttt 6120 gtgtctcata ggagctagtt tgatcatttc agttaatttt ttctttaatttttacgagtc 6180 atgaaaacta caggaaaaaa aatctgaact gggttttacc actactttttaggagttggg 6240 agcatgcgaa tggagggaga gctccgtaga actgggatga gagcagcaattaatgctgct 6300 tgctaggaac aaaaaataat tgattgaaaa ttacgtgtga ctttttagtttgcattatgc 6360 gtttgtagca gttggtcctg gatatcactt tctctcgttt gaggttttttaacctagtta 6420 acttttaaga caggtttcct taacattcat aagtgcccag aatacagctgtgtagtacag 6480 catataaaga tttcagctct gaggtttttc ctattgactt ggaaaattgttttgtgcctg 6540 tcgcttgcca catggccaat caagtaagct tcgaattcga gctcgcccaactccgcccgt 6600 tttatgacta gaaccaatag tttttaatgc caaatgcact gaaatcccctaatttgcaaa 6660 gccaaacgcc ccctatgtga gtaatacggg gactttttac ccaatttcccaagcggaaag 6720 ccccctaata cactcatatg gcatatgaat cagcacggtc atgcactctaatggcggccc 6780 atagggactt tccacatagg gggcgttcac catttcccag cataggggtggtgactcaat 6840 ggcctttacc caagtacatt gggtcaatgg gaggtaagcc aatgggtttttcccattact 6900 ggcaagcaca ctgagtcaaa tgggactttc cactgggttt tgcccaagtacattgggtca 6960 atgggaggtg agccaatggg aaaaacccat tgctgccaag tacactgactcaatagggac 7020 tttccaatgg gtttttccat tgttggcaag catataaggt caatgtgggtgagtcaatag 7080 ggactttcca ttgtattctg cccagtacat aaggtcaata gggggtgaatcaacaggaaa 7140 gtcccattgg agccaagtac actgcgtcaa tagggacttt ccattgggttttgcccagta 7200 cataaggtca ataggggatg agtcaatggg aaaaacccat tggagccaagtacactgact 7260 caatagggac tttccattgg gttttgccca gtacataagg tcaatagggggtgagtcaac 7320 aggaaagtcc cattggagcc aagtacattg agtcaatagg gactttccaatgggttttgc 7380 ccagtacata aggtcaatgg gaggtaagcc aatgggtttt tcccattactggcacgtata 7440 ctgagtcatt agggactttc caatgggttt tgcccagtac ataaggtcaataggggtgaa 7500 tcaacaggaa agtcccattg gagccaagta cactgagtca atagggactttccattgggt 7560 tttgcccagt acaaaaggtc aatagggggt gagtcaatgg gtttttcccattattggcac 7620 gtacataagg tcaatagggg tgagtcattg ggtttttcca gccaatttaattaaaacgcc 7680 atgtactttc ccaccattga cgtcaatggg ctattgaaac taatgcaacgtgacctttaa 7740 acggtacttt cccatagctg attaatggga aagtaccgtt ctcgagccaatacacgtcaa 7800 tgggaagtga aagggcagcc aaaacgtaac accgccccgg ttttcccctggaaattccat 7860 attggcacgc attctattgg ctgagctgcg ttctacgtgg gtataagaggcgcgaccagc 7920 gtcggtaccg tcgcagtctt cggtctgacc accgtagaac gcagagctcctcgctgcagc 7980 ccgggtctag aggatccgcc tgagaaagga agtgagctgt aaaggctgagctctctctct 8040 gacgtatgta gcctctggtt agcttcgtca ctcactgttc ttgactcagcatggcaatct 8100 gatgaaatcc cagctgtaag tctgcagaaa ttgatgatct attaaacaataaagatgtcc 8160 actaaaatgg aagtttttcc tgtcatactt tgttaagaag ggtgagaacagagtacctac 8220 attttgaatg gaaggattgg agctacgggg gtgggggtgg ggtgggattagataaatgcc 8280 tgctctttac tgaaggctct ttactattgc tttatgataa tgtttcatagttggatatca 8340 taatttaaac aagcaaaacc aaattaaggg ccagctcatt cctccagatccactagttct 8400 agagcaaatt ctaccgggta ggggaggcgc ttttcccaag gcagtctggagcatgcgctt 8460 tagcagcccc gctgggcact tggcgctaca caagtggcct ctggcctcgcacacattcca 8520 catccaccgg taggcgccaa ccggctccgt tctttggtgg ccccttcgcgccaccttcta 8580 ctcctcccct agtcaggaag ttcccccccg ccccgcagct cgcgtcgtgcaggacgtgac 8640 aaatggaagt agcacgtctc actagtctcg tgcagatgga cagcaccgctgagcaatgga 8700 agcgggtagg cctttggggc agcggccaat agcagctttg ctccttcgctttctgggctc 8760 agaggctggg aaggggtggg tccgggggcg ggctcagggg cgggctcaggggcggggcgg 8820 gcgcccgaag gtcctccgga ggcccggcat tctgcacgct tcaaaagcgcacgtctgccg 8880 cgctgttctc ctcttcctca tctccgggcc tttcgaccag cttaccatgaccgagtacaa 8940 gcccacggtg cgcctcgcca cccgcgacga cgtccccagg gccgtacgcaccctcgccgc 9000 cgcgttcgcc gactaccccg ccacgcgcca caccgtcgat ccggaccgccacatcgagcg 9060 ggtcaccgag ctgcaagaac tcttcctcac gcgcgtcggg ctcgacatcggcaaggtgtg 9120 ggtcgcggac gacggcgccg cggtggcggt ctggaccacg ccggagagcgtcgaagcggg 9180 ggcggtgttc gccgagatcg gcccgcgcat ggccgagttg agcggttcccggctggccgc 9240 gcagcaacag atggaaggcc tcctggcgcc gcaccggccc aaggagcccgcgtggttcct 9300 ggccaccgtc ggcgtctcgc ccgaccacca gggcaagggt ctgggcagcgccgtcgtgct 9360 ccccggagtg gaggcggccg agcgcgccgg ggtgcccgcc ttcctggagacctccgcgcc 9420 ccgcaacctc cccttctacg agcggctcgg cttcaccgtc accgccgacgtcgaggtgcc 9480 cgaaggaccg cgcacctggt gcatgacccg caagcccggt gcctgacgcccgccccacga 9540 cccgcagcgc ccgaccgaaa ggagcgcacg accccatgca tcgtagagctcgctgatcag 9600 cctcgactgt gccttctagt tgccagccat ctgttgtttg cccctcccccgtgccttcct 9660 tgaccctgga aggtgccact cccactgtcc tttcctaata aaatgaggaaattgcatcgc 9720 attgtctgag taggtgtcat tctattctgg ggggtggggt ggggcaggacagcaaggggg 9780 gggattgggr agacaatagc aggcatgctg ggggggcggt gggggctatggcttctgagg 9840 cggaaagaac cagctggggc tcgagatcca ctagttctag cctcgaggctagagcggcct 9900 gctctagagc ggccgccacc gcggtggagc tccagctttt gttccctttagtgagggtta 9960 atttcgagct tggcgtaatc atggtcatag ctgtttcctg tgtgaaattgttatccgctc 10020 acaattccac acaacatacg agccggaagc ataaagtgta aagcctggggtgcctaatga 10080 gtgagctaac tcacattaat tgcgttgcgc tcactgcccg ctttccagtcgggaaacctg 10140 tcgtgccagg gggtacctag gccgggcaac aattggcggc cggccgcacttttcggggaa 10200 atgtgcgcgg aacccctatt tgtttatttt tctaaataca ttcaaatatgtatccgctca 10260 tgagacaata accctgataa atgcttcaat aatattgaaa aaggaagagtatgagtattc 10320 aacatttccg tgtcgccctt attccctttt ttgcggcatt ttgccttcctgtttttgctc 10380 acccagaaac gctggtgaaa gtaaaagatg ctgaagatca gttgggtgcacgagtgggtt 10440 acatcgaact ggatctcaac agcggtaaga tccttgagag ttttcgccccgaagaacgtt 10500 ttccaatgat gagcactttt aaagttctgc tatgtggcgc ggtattatcccgtattgacg 10560 ccgggcaaga gcaactcggt cgccgcatac actattctca gaatgacttggttgagtact 10620 caccagtcac agaaaagcat cttacggatg gcatgacagt aagagaattatgcagtgctg 10680 ccataaccat gagtgataac actgcggcca acttacttct gacaacgatcggaggaccga 10740 aggagctaac cgcttttttg cacaacatgg gggatcatgt aactcgccttgatcgttggg 10800 aaccggagct gaatgaagcc ataccaaacg acgagcgtga caccacgatgcctgtagcaa 10860 tggcaacaac gttgcgcaaa ctattaactg gcgaactact tactctagcttcccggcaac 10920 aattaataga ctggatggag gcggataaag ttgcaggacc acttctgcgctcggcccttc 10980 cggctggctg gtttattgct gataaatctg gagccggtga gcgtgggtctcgcggtatca 11040 ttgcagcact ggggccagat ggtaagccct cccgtatcgt agttatctacacgacgggga 11100 gtcaggcaac tatggatgaa cgaaatagac agatcgctga gataggtgcctcactgatta 11160 agcattggta actgtcagac cctaggccgg gcaacaattg gcggccggccctgcattaat 11220 gaatcggcca acgcgcgggg agaggcggtt tgcgtattgg gcgctcttccgcttcctcgc 11280 tcactgactc gctgcgctcg gtcgttcggc tgcggcgagc ggtatcagctcactcaaagg 11340 cggtaatacg gttatccaca gaatcagggg ataacgcagg aaagaacatgtgagcaaaag 11400 gccagcaaaa ggccaggaac cgtaaaaagg ccgcgttgct ggcgtttttccataggctcc 11460 gcccccctga cgagcatcac aaaaatcgac gctcaagtca gaggtggcgaaacccgacag 11520 gactataaag ataccaggcg tttccccctg gaagctccct cgtgcgctctcctgttccga 11580 ccctgccgct taccggatac ctgtccgcct ttctcccttc gggaagcgtggcgctttctc 11640 atagctcacg ctgtaggtat ctcagttcgg tgtaggtcgt tcgctccaagctgggctgtg 11700 tgcacgaacc ccccgttcag cccgaccgct gcgccttatc cggtaactatcgtcttgagt 11760 ccaacccggt aagacacgac ttatcgccac tggcagcagc cactggtaacaggattagca 11820 gagcgaggta tgtaggcggt gctacagagt tcttgaagtg gtggcctaactacggctaca 11880 ctagaaggac agtatttggt atctgcgctc tgctgaagcc agttaccttcggaaaaagag 11940 ttggtagctc ttgatccggc aaacaaacca ccgctggtag cggtggtttttttgtttgca 12000 agcagcagat tacgcgcaga aaaaaaggat ctcaagaaga tcctttgatcttttctacgg 12060 ggtctgacgc tcagtggaac gaaaactc 12088 5 12704 DNAArtificial Sequence Artificial Sequence containing human UCOE elementsand vector sequence 5 acgttgtaaa acgacggcca gtgaattgta atacgactcactatagggcg aattgggtac 60 cgggcccccc ctcgaggtcg agttggggtg gggaaaaggaagaaacgcgg gcgtattggc 120 cccaatgggg tctcggtggg gtatcgacag agtgccagccctgggaccga accccgcgtt 180 tatgaacaaa cgacccaaca cccgtgcgtt ttattctgtctttttattgc cgtcatagcg 240 cgggttcctt ccggtattgt ctccttccgt cgactctagacccgggctgc agcgaggagc 300 tctgcgttct acggtggtca gaccgaagac tgcgacggtaccgacgctgg tcgcgcctct 360 tatacccacg tagaacgcag ctcagccaat agaatgcgtgccaatatgga atttccaggg 420 gaaaaccggg gcggtgttac gttttggctg ccctttcacttcccattgac gtgtattggc 480 tcgagaacgg tactttccca ttaatcagct atgggaaagtaccgtttaaa ggtcacgttg 540 cattagtttc aatagcccat tgacgtcaat ggtgggaaagtacatggcgt tttaattaaa 600 ttggctggaa aaacccaatg actcacccct attgaccttatgtacgtgcc aataatggga 660 aaaacccatt gactcacccc ctattgacct tttgtactgggcaaaaccca atggaaagtc 720 cctattgact cagtgtactt ggctccaatg ggactttcctgttgattcac ccctattgac 780 cttatgtact gggcaaaacc cattggaaag tccctaatgactcagtatac gtgccagtaa 840 tgggaaaaac ccattggctt acctcccatt gaccttatgtactgggcaaa acccattgga 900 aagtccctat tgactcaatg tacttggctc caatgggactttcctgttga ctcaccccct 960 attgacctta tgtactgggc aaaacccaat ggaaagtccctattgagtca gtgtacttgg 1020 ctccaatggg tttttcccat tgactcatcc cctattgaccttatgtactg ggcaaaaccc 1080 aatggaaagt ccctattgac gcagtgtact tggctccaatgggactttcc tgttgattca 1140 ccccctattg accttatgta ctgggcagaa tacaatggaaagtccctatt gactcaccca 1200 cattgacctt atatgcttgc caacaatgga aaaacccattggaaagtccc tattgagtca 1260 gtgtacttgg cagcaatggg tttttcccat tggctcacctcccattgacc caatgtactt 1320 gggcaaaacc cagtggaaag tcccatttga ctcagtgtgcttgccagtaa tgggaaaaac 1380 ccattggctt acctcccatt gacccaatgt acttgggtaaaggccattga gtcaccaccc 1440 ctatgctggg aaatggtgaa cgccccctat gtggaaagtccctatgggcc gccattagag 1500 tgcatgaccg tgctgattca tatgccatat gagtgtattagggggctttc cgcttgggaa 1560 attgggtaaa aagtccccgt attactcaca tagggggcgtttggctttgc aaattagggg 1620 atttcagtgc atttggcatt aaaaactatt ggttctagtcataaaacggg cggagttggg 1680 cgagctcgaa ttcaaacgac tcgacggtat caaggtggcgaccggaatgg tgagctgcga 1740 gaatagccgg gcgcgctgtg agccgaagtc gcccccgccctggccacttc cggcgcgccg 1800 agtccttagg ccgccagggg gcgccggcgc gcgcccagattggggacaaa ggaagccggg 1860 ccggccgcgt tattaccata aaaggcaaac actggtcggaggcgtccccg cggcgcgcgg 1920 caggaagcca ggccccaacc ccctcccaac cgggcgccagccccgcctcc gcccggttca 1980 aacagcgacc gggtcgcgcg cgcgcacgca gcggccacaccctcgggcgc cagcggctcg 2040 ggcaggaagt ggcgcaagcg cccgggcccc agaacgcacgcgcgattagc gccattgagt 2100 cccagcgcgc acgcgcaatt agcgccaatt cccagcgcgcacgcagttag cgcccaaagg 2160 accagcgcgc acgcgcatgg cgccccagcc cccaccgggcctgacggggg ctacgccgcg 2220 cccaccgtgc gatccccatt ggcaagagcc cggctcagacaaagaccccg ccggttgccc 2280 ccgccccgag agcggcaccc ccggagcgcg cccgcccgagcgcggcctcg cgcctgcgaa 2340 ctggcgtggg gtgtccccca tctccggagg cccaggggcttctcccgcgc cccccacggc 2400 ggtccggttc cgccccatgc gccccccgct gcggcccagacggcggctct gcacgggcga 2460 agggccgcgg ccgcatgccc cggtcggctg gccgggcttacctggcggcg ggtgtggacg 2520 ggcggcggat cggcaaaggc gaggctctgt gctcgcgggcggacgcggtc tcggcggtgg 2580 tggcgcgtcg cgccgctggg ttttataggg cgccgccgcggccgctcgag ccataaaagg 2640 caactttcgg aacggcgcac gctgattggc cccgcgccgctcactcaccg gcttcgccgc 2700 acagtgcagc atttttttac cccctctccc ctccttttgcgaaaaaaaaa aagagcgaga 2760 gcgagattga ggaagaggag gagggagagt tttggcgttggccgccttgg ggtgctgggc 2820 ccgggggctg ggggcgcgcg ccgtggcccc cgcgccccacgctgggcagt gcccggttcg 2880 gccccgcatg gccaggcctg cccccggcct gcccgtctctcgggcccccc acccaccgcg 2940 ggacatccta ggtgtggaca tctcttgggc actgagcgcccaggtggggt gggccagggt 3000 ctgcacgggt gccagggccc tgggttctgt acgctcctgcagaaggagct cttggagggc 3060 atggagtggc caggcagtca ctcccccttg ccgacttcagagcaactgcc ctgaaagcag 3120 ggcctgagga cctctggctg tggggctcag ctagctaaatgtgctgggtg ggtcactagg 3180 gagagacctg ggcttgagag gtagagtgtg gtgttgggggagtcaggtgg cttgcggcca 3240 ttagagtcgc aggaccacac tccccaggac agggcaggggccagcggtcc agtggctgga 3300 ggtggcccgt gatgaaggct acaaacctac ccagccgcagccctgggaag gaagtgggct 3360 ctacagggca gggcaccttt taccctggag ctgcctgcttttgagggtaa cagtcacgcc 3420 cagccaagac caggcctggg gcgttagtgg gtgacctaggcactgcgggg cgggggggct 3480 gggtctacac agcctgggtc tgggcccacc gtccgttgtatgtctgctat gcgcagccac 3540 agctgaactg ccctcccaga ccatctggag gccgctgggggactctgggg accaagactc 3600 catgtgccac agaggattgg gggcggggcg gtgctaggaactcaaagcca gcctgggaag 3660 accctgtcct tgtcaccctt tcttgccttg ggtctgtccactgagtagca cacaagaccg 3720 ggtgggcagg gtccgttctg ctccgggaat cacagactgtgtgtacccag gtggtgggca 3780 tgcagcgatc agtggcgtgg gaccacagag ggggcccgcggtacctaaaa cagcttcaca 3840 tggcttaaaa taggggacca atgtcttttc caatctaagtcccatttata ataaagtcca 3900 tgttccattt ttaaaggaca atcctttcgg tttaaaaccaggcacgatta cccaaacaac 3960 tcacaacggt aaagcactgt gaatcttctc tgttctgcaatcccaacttg gtttctgctc 4020 agaaaccctc cctctttcca atcggtaatt aaataacaaaaggaaaaaac ttaagatgct 4080 tcaaccccgt ttcgtgacac tttgaaaaaa gaatcacctcttgcaaacac ccgctcccga 4140 cccccgccgc tgaagcccgg cgtccagagg cctaagcgcgggtgcccgcc cccacccggg 4200 agcgcgggcc tcgtggtcag cgcatccgcg gggagaaacaaaggccgcgg cacgggggct 4260 caagggcact gcgccacacc gcacgcgcct acccccgcgcggccacgtta actggcggtc 4320 gccgcagcct cgggacagcc ggccgcgcgc cgccaggctcgcggacgcgg gaccacgcgc 4380 cgccctccgg gaggcccaag tctcgaccca gccccgcgtggcgctggggg agggggcgcc 4440 tccgccggaa cgcgggtggg ggaggggagg gggaaatgcgctttgtctcg aaatggggca 4500 accgtcgcca cagctcccta ccccctcgag ggcagagcagtccccccact aactaccggg 4560 ctggccgcgc gccaggccag ccgcgaggcc accgcccgaccctccactcc ttcccgcagc 4620 tcccggcgcg gggtccggcg agaaggggag gggaggggagcggagaaccg ggcccccggg 4680 acgcgtgtgg catctgaagc accaccagcg agcgagagctagagagaagg aaagccaccg 4740 acttcaccgc ctccgagctg ctccgggtcg cgggtctgcagcgtctccgg ccctccgcgc 4800 ctacagctca agccacatcc gaagggggag ggagccgggagctgcgcgcg gggccgccgg 4860 ggggaggggt ggcaccgccc acgccgggcg gccacgaagggcggggcagc gggcgcgcgc 4920 gcggcggggg gaggggccgg cgccgcgccc gctgggaattggggccctag ggggagggcg 4980 gaggcgccga cgaccgcggc acttaccgtt cgcggcgtggcgcccggtgg tccccaaggg 5040 gagggaaggg ggaggcgggg cgaggacagt gaccggagtctcctcagcgg tggcttttct 5100 gcttggcagc ctcagcggct ggcgccaaaa ccggactccgcccacttcct cgcccgccgg 5160 tgcgagggtg tggaatcctc cagacgctgg gggagggggagttgggagct taaaaactag 5220 tacccctttg ggaccacttt cagcagcgaa ctctcctgtacaccaggggt cagttccaca 5280 gacgcgggcc aggggtgggt cattgcggcg tgaacaataatttgactaga agttgattcg 5340 ggtgtttccg gaaggggccg agtcaatccg ccgagttggggcacggaaaa caaaaaggga 5400 aggctactaa gatttttctg gcgggggtta tcattggcgtaactgcaggg accacctccc 5460 gggttgaggg ggctggatct ccaggctgcg gattaagcccctcccgtcgg cgttaatttc 5520 aaactgcgcg acgtttctca cctgccttcg ccaaggcaggggccgggacc ctattccaag 5580 aggtagtaac tagcaggact ctagccttcc gcaattcattgagcgcattt acggaagtaa 5640 cgtcgggtac tgtctctggc cgcaagggtg ggaggagtacgcatttggcg taaggtgggg 5700 cgtagagcct tcccgccatt ggcggcggat agggcgtttacgcgacggcc tgacgtagcg 5760 gaagacgcgt tagtgggggg gaaggttcta gaaaagcggcggcagcggct ctagcggcag 5820 tagcagcagc gccgggtccc gtgcggaggt gctcctcgcagagttgtttc tcgagcagcg 5880 gcagttctca ctacagcgcc aggacgagtc cggttcgtgttcgtccgcgg agatctctct 5940 catctcgctc ggctgcggga aatcgggctg aagcgactgagtccgcgatg gaggtaacgg 6000 gtttgaaatc aatgagttat tgaaaagggc atggcgaggccgttggcgcc tcagtggaag 6060 tcggccagcc gcctccgtgg gagagaggca ggaaatcggaccaattcagt agcagtgggg 6120 cttaaggttt atgaacgggg tcttgagcgg aggcctgagcgtacaaacag cttccccacc 6180 ctcagcctcc cggcgccatt tcccttcact gggggtgggggatggggagc tttcacatgg 6240 cggacgctgc cccgctgggg tgaaagtggg gcgcggaggcgggaattctt attccctttc 6300 taaagcacgc tgcttcgggg gccacggcgt ctcctcggcgagcgtttcgg cgggcagcag 6360 gtcctcgtga gcgaggctgc ggagcttccc ctccccctctctcccgggaa ccgatttggc 6420 ggccgccatt ttcatggctc gccttcctct cagcgttttccttataactc ttttattttc 6480 ttagtgtgct ttctctatca agaagtagaa gtggttaactattttttttt tcttctcggg 6540 ctgttttcat atcgtttcga ggtggatttg gagtgttttgtgagcttgga tctttagagt 6600 cctgcgcacc tcattaaagg cgctcagcct tcccctcgatgaaatggcgc cattgcgttc 6660 ggaagccaca ccgaagagcg gggagggggg gtgctccgggtttgcgggcc cggtttcaga 6720 gaagatatca ccacccaggg cgtcgggccg ggttcaatgcgagccgtagg acaaagaaac 6780 cattttatgt ttttcctgtc ttttttttcc tttgagtaacggttttatct gggtctgcag 6840 tcagtaaaac gacagatgaa ccgcggcaaa ataaacataaattggaagcc atcggccacg 6900 aggggcaggg acgaaggtgg ttttctgggc gggggagggatattcgcgtc agaatccttt 6960 actgttctta aggattccgt ttaagttgta gagctgactcattttaagta atgttgttac 7020 tgagaagttt aacccttacg ggacagatcc atggacctttatagatgatt acgaggaaag 7080 tgaaataacg attttgtcct tagttatact tcgattaaaacatggcttca gaggctcctt 7140 cctgtaatgc gtatggattg atgtgcaaaa ctgttttgggcctgggccgc tctgtatttg 7200 aactttgtta cttttctcat tttgtttgca atcttggttgaacattacat tgataagcat 7260 aaggtctcaa gcgaaggggg tctacctggt tatttttctttgaccctaag cacgtttata 7320 aaataacatt gtttaaaatc gatagtggac atcgggtaagtttggataaa ttgtgaggta 7380 agtaatgagt ttttgctttt tgttagtgat ttgtaaaacttgttataaat gtacattatc 7440 cgtaatttca gtttagagat aacctatgtg ctgacgacaattaagaataa aaactagctg 7500 aaaaaatgaa aataactatc gtgacaagta accatttcaaaagactgctt tgtgtctcat 7560 aggagctagt ttgatcattt cagttaattt tttctttaatttttacgagt catgaaaact 7620 acaggaaaaa aaatctgaac tgggttttac cactactttttaggagttgg gagcatgcga 7680 atggagggag agctccgtag aactgggatg agagcagcaattaatgctgc ttgctaggaa 7740 caaaaaataa ttgattgaaa attacgtgtg actttttagtttgcattatg cgtttgtagc 7800 agttggtcct ggatatcact ttctctcgtt tgaggttttttaacctagtt aacttttaag 7860 acaggtttcc ttaacattca taagtgccca gaatacagctgtgtagtaca gcatataaag 7920 atttcagctc tgaggttttt cctattgact tggaaaattgttttgtgcct gtcgcttgcc 7980 acatggccaa tcaagtaagc ttattaatag taatcaattacggggtcatt agttcatagc 8040 ccatatatgg agttccgcgt tacataactt acggtaaatggcccgcctgg ctgaccgccc 8100 aacgaccccc gcccattgac gtcaataatg acgtatgttcccatagtaac gccaataggg 8160 actttccatt gacgtcaatg ggtggagtat ttacggtaaactgcccactt ggcagtacat 8220 caagtgtatc atatgccaag tacgccccct attgacgtcaatgacggtaa atggcccgcc 8280 tggcattatg cccagtacat gaccttatgg gactttcctacttggcagta catctacgta 8340 ttagtcatcg ctattaccat ggtgatgcgg ttttggcagtacatcaatgg gcgtggatag 8400 cggtttgact cacggggatt tccaagtctc caccccattgacgtcaatgg gagtttgttt 8460 tggcaccaaa atcaacggga ctttccaaaa tgtcgtaacaactccgcccc attgacgcaa 8520 atgggcggta ggcgtgtacg gtgggaggtc tatataagcagagctggttt agtgaaccgt 8580 cagatcggat ccgcctgaga aaggaagtga gctgtaaaggctgagctctc tctctgacgt 8640 atgtagcctc tggttagctt cgtcactcac tgttcttgactcagcatggc aatctgatga 8700 aatcccagct gtaagtctgc agaaattgat gatctattaaacaataaaga tgtccactaa 8760 aatggaagtt tttcctgtca tactttgtta agaagggtgagaacagagta cctacatttt 8820 gaatggaagg attggagcta cgggggtggg ggtggggtgggattagataa atgcctgctc 8880 tttactgaag gctctttact attgctttat gataatgtttcatagttgga tatcataatt 8940 taaacaagca aaaccaaatt aagggccagc tcattcctccagatccacta gtaattctgt 9000 ggaatgtgtg tcagttaggg tgtggaaagt ccccaggctccccagcaggc agaagtatgc 9060 aaagcatgca tctcaattag tcagcaacca ggtgtggaaagtccccaggc tccccagcag 9120 gcagaagtat gcaaagcatg catctcaatt agtcagcaaccatagtcccg cccctaactc 9180 cgcccatccc gcccctaact ccgcccagtt ccgcccattctccgccccat ggctgactaa 9240 ttttttttat ttatgcagag gccgaggccg cctctgcctctgagctattc cagaagtagt 9300 gaggaggctt ttttggaggc ctaggctttt gcaaaaagctcccgggagct tgtatatcca 9360 ttttcggatc tgatcaagag acaggatgag gatcgtttcgcatgattgaa caagatggat 9420 tgcacgcagg ttctccggcc gcttgggtgg agaggctattcggctatgac tgggcacaac 9480 agacaatcgg ctgctctgat gccgccgtgt tccggctgtcagcgcagggg cgcccggttc 9540 tttttgtcaa gaccgacctg tccggtgccc tgaatgaactgcaggacgag gcagcgcggc 9600 tatcstggct ggccacgacg ggcgttcctt gcgcagctgtgctcgacgtt gtcactgaag 9660 cgggaaggga ctggctgcta ttgggcgaag tgccggggcaggatctcctg tcatctcacc 9720 ttgctcctgc cgagaaagta tccatcatgg ctgatgcaatgcggcggctg catacgcttg 9780 atccggctac ctgcccattc gaccaccaag cgaaacatcgcatcgagcga gcacgtactc 9840 ggatggaagc cggtcttgtc gatcaggatg atctggacgaagagcatcag gggctcgcgc 9900 cagccgaact gttcgccagg ctcaaggcgc gcatgcccgacggcgaggat ctcgtcgtga 9960 cccatggcga tgcctgcttg ccgaatatca tggtggaaaatggccgcttt tctggattca 10020 tcgactgtgg ccggctgggt gtggcggacc gctatcaggacatagcgttg gctacccgtg 10080 atattgctga agagcttggc ggcgaatggg ctgaccgcttcctcgtgctt tacggtatcg 10140 ccgctcccga ttcgcagcgc atcgccttct atcgccttcttgacgagttc ttctgagcgg 10200 gactctgggg ttcgaaatga ccgaccaagc gacgcccaacctgccatcac gagatttcga 10260 ttccaccgcc gccttctatg aaaggttggg cttcggaatcgttttccggg acgccggctg 10320 gatgatcctc cagcgcgggg atctcatgct ggagttcttcgcccacccca acttgtttat 10380 tgcagcttat aatggttaca aataaagcaa tagcatcacaaatttcacaa ataaagcatt 10440 tttttcactg cattctagtt gtggtttgtc caaactcatcaatgtatctt atcatgtctg 10500 tataccgtcg agactagttc tagagcggcc gccaccgcggtggagctcca gcttttgttc 10560 cctttagtga gggttaattt cgagcttggc gtaatcatggtcatagctgt ttcctgtgtg 10620 aaattgttat ccgctcacaa ttccacacaa catacgagccggaagcataa agtgtaaagc 10680 ctggggtgcc taatgagtga gctaactcac attaattgcgttgcgctcac tgcccgcttt 10740 ccagtcggga aacctgtcgt gccagggggt acctaggccgggcaacaatt ggcggccggc 10800 cgcacttttc ggggaaatgt gcgcggaacc cctatttgtttatttttcta aatacattca 10860 aatatgtatc cgctcatgag acaataaccc tgataaatgcttcaataata ttgaaaaagg 10920 aagagtatga gtattcaaca tttccgtgtc gcccttattcccttttttgc ggcattttgc 10980 cttcctgttt ttgctcaccc agaaacgctg gtgaaagtaaaagatgctga agatcagttg 11040 ggtgcacgag tgggttacat cgaactggat ctcaacagcggtaagatcct tgagagtttt 11100 cgccccgaag aacgttttcc aatgatgagc acttttaaagttctgctatg tggcgcggta 11160 ttatcccgta ttgacgccgg gcaagagcaa ctcggtcgccgcatacacta ttctcagaat 11220 gacttggttg agtactcacc agtcacagaa aagcatcttacggatggcat gacagtaaga 11280 gaattatgca gtgctgccat aaccatgagt gataacactgcggccaactt acttctgaca 11340 acgatcggag gaccgaagga gctaaccgct tttttgcacaacatggggga tcatgtaact 11400 cgccttgatc gttgggaacc ggagctgaat gaagccataccaaacgacga gcgtgacacc 11460 acgatgcctg tagcaatggc aacaacgttg cgcaaactattaactggcga actacttact 11520 ctagcttccc ggcaacaatt aatagactgg atggaggcggataaagttgc aggaccactt 11580 ctgcgctcgg cccttccggc tggctggttt attgctgataaatctggagc cggtgagcgt 11640 gggtctcgcg gtatcattgc agcactgggg ccagatggtaagccctcccg tatcgtagtt 11700 atctacacga cggggagtca ggcaactatg gatgaacgaaatagacagat cgctgagata 11760 ggtgcctcac tgattaagca ttggtaactg tcagaccctaggccgggcaa caattggcgg 11820 ccggccctgc attaatgaat cggccaacgc gcggggagaggcggtttgcg tattgggcgc 11880 tcttccgctt cctcgctcac tgactcgctg cgctcggtcgttcggctgcg gcgagcggta 11940 tcagctcact caaaggcggt aatacggtta tccacagaatcaggggataa cgcaggaaag 12000 aacatgtgag caaaaggcca gcaaaaggcc aggaaccgtaaaaaggccgc gttgctggcg 12060 tttttccata ggctccgccc ccctgacgag catcacaaaaatcgacgctc aagtcagagg 12120 tggcgaaacc cgacaggact ataaagatac caggcgtttccccctggaag ctccctcgtg 12180 cgctctcctg ttccgaccct gccgcttacc ggatacctgtccgcctttct cccttcggga 12240 agcgtggcgc tttctcatag ctcacgctgt aggtatctcagttcggtgta ggtcgttcgc 12300 tccaagctgg gctgtgtgca cgaacccccc gttcagcccgaccgctgcgc cttatccggt 12360 aactatcgtc ttgagtccaa cccggtaaga cacgacttatcgccactggc agcagccact 12420 ggtaacagga ttagcagagc gaggtatgta ggcggtgctacagagttctt gaagtggtgg 12480 cctaactacg gctacactag aaggacagta tttggtatctgcgctctgct gaagccagtt 12540 accttcggaa aaagagttgg tagctcttga tccggcaaacaaaccaccgc tggtagcggt 12600 ggtttttttg tttgcaagca gcagattacg cgcagaaaaaaaggatctca agaagatcct 12660 ttgatctttt ctacggggtc tgacgctcag tggaacgaaaactc 12704 6 11273 DNA Artificial Sequence Artificial Sequencecontaining human UCOE elements and vector sequence 6 acgttgtaaaacgacggcca gtgaattgta atacgactca ctatagggcg aattgggtac 60 cgggccccccctcgaggtcg agttggggtg gggaaaagga agaaacgcgg gcgtattggc 120 cccaatggggtctcggtggg gtatcgacag agtgccagcc ctgggaccga accccgcgtt 180 tatgaacaaacgacccaaca cccgtgcgtt ttattctgtc tttttattgc cgtcatagcg 240 cgggttccttccggtattgt ctccttccgt cgacggtatc aaggtggcga ccggaatggt 300 gagctgcgagaatagccggg cgcgctgtga gccgaagtcg cccccgccct ggccacttcc 360 ggcgcgccgagtccttaggc cgccaggggg cgccggcgcg cgcccagatt ggggacaaag 420 gaagccgggccggccgcgtt attaccataa aaggcaaaca ctggtcggag gcgtccccgc 480 ggcgcgcggcaggaagccag gccccaaccc cctcccaacc gggcgccagc cccgcctccg 540 cccggttcaaacagcgaccg ggtcgcgcgc gcgcacgcag cggccacacc ctcgggcgcc 600 agcggctcgggcaggaagtg gcgcaagcgc ccgggcccca gaacgcacgc gcgattagcg 660 ccattgagtcccagcgcgca cgcgcaatta gcgccaattc ccagcgcgca cgcagttagc 720 gcccaaaggaccagcgcgca cgcgcatggc gccccagccc ccaccgggcc tgacgggggc 780 tacgccgcgcccaccgtgcg atccccattg gcaagagccc ggctcagaca aagaccccgc 840 cggttgcccccgccccgaga gcggcacccc cggagcgcgc ccgcccgagc gcggcctcgc 900 gcctgcgaactggcgtgggg tgtcccccat ctccggaggc ccaggggctt ctcccgcgcc 960 ccccacggcggtccggttcc gccccatgcg ccccccgctg cggcccagac ggcggctctg 1020 cacgggcgaagggccgcggc cgcatgcccc ggtcggctgg ccgggcttac ctggcggcgg 1080 gtgtggacgggcggcggatc ggcaaaggcg aggctctgtg ctcgcgggcg gacgcggtct 1140 cggcggtggtggcgcgtcgc gccgctgggt tttatagggc gccgccgcgg ccgctcgagc 1200 cataaaaggcaactttcgga acggcgcacg ctgattggcc ccgcgccgct cactcaccgg 1260 cttcgccgcacagtgcagca tttttttacc ccctctcccc tccttttgcg aaaaaaaaaa 1320 agagcgagagcgagattgag gaagaggagg agggagagtt ttggcgttgg ccgccttggg 1380 gtgctgggcccgggggctgg gggcgcgcgc cgtggccccc gcgccccacg ctgggcagtg 1440 cccggttcggccccgcatgg ccaggcctgc ccccggcctg cccgtctctc gggcccccca 1500 cccaccgcgggacatcctag gtgtggacat ctcttgggca ctgagcgccc aggtggggtg 1560 ggccagggtctgcacgggtg ccagggccct gggttctgta cgctcctgca gaaggagctc 1620 ttggagggcatggagtggcc aggcagtcac tcccccttgc cgacttcaga gcaactgccc 1680 tgaaagcagggcctgaggac ctctggctgt ggggctcagc tagctaaatg tgctgggtgg 1740 gtcactagggagagacctgg gcttgagagg tagagtgtgg tgttggggga gtcaggtggc 1800 ttgcggccattagagtcgca ggaccacact ccccaggaca gggcaggggc cagcggtcca 1860 gtggctggaggtggcccgtg atgaaggcta caaacctacc cagccgcagc cctgggaagg 1920 aagtgggctctacagggcag ggcacctttt accctggagc tgcctgcttt tgagggtaac 1980 agtcacgcccagccaagacc aggcctgggg cgttagtggg tgacctaggc actgcggggc 2040 gggggggctgggtctacaca gcctgggtct gggcccaccg tccgttgtat gtctgctatg 2100 cgcagccacagctgaactgc cctcccagac catctggagg ccgctggggg actctgggga 2160 ccaagactccatgtgccaca gaggattggg ggcggggcgg tgctaggaac tcaaagccag 2220 cctgggaagaccctgtcctt gtcacccttt cttgccttgg gtctgtccac tgagtagcac 2280 acaagaccgggtgggcaggg tccgttctgc tccgggaatc acagactgtg tgtacccagg 2340 tggtgggcatgcagcgatca gtggcgtggg accacagagg gggcccgcgg tacctaaaac 2400 agcttcacatggcttaaaat aggggaccaa tgtcttttcc aatctaagtc ccatttataa 2460 taaagtccatgttccatttt taaaggacaa tcctttcggt ttaaaaccag gcacgattac 2520 ccaaacaactcacaacggta aagcactgtg aatcttctct gttctgcaat cccaacttgg 2580 tttctgctcagaaaccctcc ctctttccaa tcggtaatta aataacaaaa ggaaaaaact 2640 taagatgcttcaaccccgtt tcgtgacact ttgaaaaaag aatcacctct tgcaaacacc 2700 cgctcccgacccccgccgct gaagcccggc gtccagaggc ctaagcgcgg gtgcccgccc 2760 ccacccgggagcgcgggcct cgtggtcagc gcatccgcgg ggagaaacaa aggccgcggc 2820 acgggggctcaagggcactg cgccacaccg cacgcgccta cccccgcgcg gccacgttaa 2880 ctggcggtcgccgcagcctc gggacagccg gccgcgcgcc gccaggctcg cggacgcggg 2940 accacgcgccgccctccggg aggcccaagt ctcgacccag ccccgcgtgg cgctggggga 3000 gggggcgcctccgccggaac gcgggtgggg gaggggaggg ggaaatgcgc tttgtctcga 3060 aatggggcaaccgtcgccac agctccctac cccctcgagg gcagagcagt ccccccacta 3120 actaccgggctggccgcgcg ccaggccagc cgcgaggcca ccgcccgacc ctccactcct 3180 tcccgcagctcccggcgcgg ggtccggcga gaaggggagg ggaggggagc ggagaaccgg 3240 gcccccgggacgcgtgtggc atctgaagca ccaccagcga gcgagagcta gagagaagga 3300 aagccaccgacttcaccgcc tccgagctgc tccgggtcgc gggtctgcag cgtctccggc 3360 cctccgcgcctacagctcaa gccacatccg aagggggagg gagccgggag ctgcgcgcgg 3420 ggccgccggggggaggggtg gcaccgccca cgccgggcgg ccacgaaggg cggggcagcg 3480 ggcgcgcgcgcggcgggggg aggggccggc gccgcgcccg ctgggaattg gggccctagg 3540 gggagggcggaggcgccgac gaccgcggca cttaccgttc gcggcgtggc gcccggtggt 3600 ccccaaggggagggaagggg gaggcggggc gaggacagtg accggagtct cctcagcggt 3660 ggcttttctgcttggcagcc tcagcggctg gcgccaaaac cggactccgc ccacttcctc 3720 gcccgccggtgcgagggtgt ggaatcctcc agacgctggg ggagggggag ttgggagctt 3780 aaaaactagtacccctttgg gaccactttc agcagcgaac tctcctgtac accaggggtc 3840 agttccacagacgcgggcca ggggtgggtc attgcggcgt gaacaataat ttgactagaa 3900 gttgattcgggtgtttccgg aaggggccga gtcaatccgc cgagttgggg cacggaaaac 3960 aaaaagggaaggctactaag atttttctgg cgggggttat cattggcgta actgcaggga 4020 ccacctcccgggttgagggg gctggatctc caggctgcgg attaagcccc tcccgtcggc 4080 gttaatttcaaactgcgcga cgtttctcac ctgccttcgc caaggcaggg gccgggaccc 4140 tattccaagaggtagtaact agcaggactc tagccttccg caattcattg agcgcattta 4200 cggaagtaacgtcgggtact gtctctggcc gcaagggtgg gaggagtacg catttggcgt 4260 aaggtggggcgtagagcctt cccgccattg gcggcggata gggcgtttac gcgacggcct 4320 gacgtagcggaagacgcgtt agtggggggg aaggttctag aaaagcggcg gcagcggctc 4380 tagcggcagtagcagcagcg ccgggtcccg tgcggaggtg ctcctcgcag agttgtttct 4440 cgagcagcggcagttctcac tacagcgcca ggacgagtcc ggttcgtgtt cgtccgcgga 4500 gatctctctcatctcgctcg gctgcgggaa atcgggctga agcgactgag tccgcgatgg 4560 aggtaacgggtttgaaatca atgagttatt gaaaagggca tggcgaggcc gttggcgcct 4620 cagtggaagtcggccagccg cctccgtggg agagaggcag gaaatcggac caattcagta 4680 gcagtggggcttaaggttta tgaacggggt cttgagcgga ggcctgagcg tacaaacagc 4740 ttccccaccctcagcctccc ggcgccattt cccttcactg ggggtggggg atggggagct 4800 ttcacatggcggacgctgcc ccgctggggt gaaagtgggg cgcggaggcg ggaattctta 4860 ttccctttctaaagcacgct gcttcggggg ccacggcgtc tcctcggcga gcgtttcggc 4920 gggcagcaggtcctcgtgag cgaggctgcg gagcttcccc tccccctctc tcccgggaac 4980 cgatttggcggccgccattt tcatggctcg ccttcctctc agcgttttcc ttataactct 5040 tttattttcttagtgtgctt tctctatcaa gaagtagaag tggttaacta tttttttttt 5100 cttctcgggctgttttcata tcgtttcgag gtggatttgg agtgttttgt gagcttggat 5160 ctttagagtcctgcgcacct cattaaaggc gctcagcctt cccctcgatg aaatggcgcc 5220 attgcgttcggaagccacac cgaagagcgg ggaggggggg tgctccgggt ttgcgggccc 5280 ggtttcagagaagatatcac cacccagggc gtcgggccgg gttcaatgcg agccgtagga 5340 caaagaaaccattttatgtt tttcctgtct tttttttcct ttgagtaacg gttttatctg 5400 ggtctgcagtcagtaaaacg acagatgaac cgcggcaaaa taaacataaa ttggaagcca 5460 tcggccacgaggggcaggga cgaaggtggt tttctgggcg ggggagggat attcgcgtca 5520 gaatcctttactgttcttaa ggattccgtt taagttgtag agctgactca ttttaagtaa 5580 tgttgttactgagaagttta acccttacgg gacagatcca tggaccttta tagatgatta 5640 cgaggaaagtgaaataacga ttttgtcctt agttatactt cgattaaaac atggcttcag 5700 aggctccttcctgtaatgcg tatggattga tgtgcaaaac tgttttgggc ctgggccgct 5760 ctgtatttgaactttgttac ttttctcatt ttgtttgcaa tcttggttga acattacatt 5820 gataagcataaggtctcaag cgaagggggt ctacctggtt atttttcttt gaccctaagc 5880 acgtttataaaataacattg tttaaaatcg atagtggaca tcgggtaagt ttggataaat 5940 tgtgaggtaagtaatgagtt tttgcttttt gttagtgatt tgtaaaactt gttataaatg 6000 tacattatccgtaatttcag tttagagata acctatgtgc tgacgacaat taagaataaa 6060 aactagctgaaaaaatgaaa ataactatcg tgacaagtaa ccatttcaaa agactgcttt 6120 gtgtctcataggagctagtt tgatcatttc agttaatttt ttctttaatt tttacgagtc 6180 atgaaaactacaggaaaaaa aatctgaact gggttttacc actacttttt aggagttggg 6240 agcatgcgaatggagggaga gctccgtaga actgggatga gagcagcaat taatgctgct 6300 tgctaggaacaaaaaataat tgattgaaaa ttacgtgtga ctttttagtt tgcattatgc 6360 gtttgtagcagttggtcctg gatatcactt tctctcgttt gaggtttttt aacctagtta 6420 acttttaagacaggtttcct taacattcat aagtgcccag aatacagctg tgtagtacag 6480 catataaagatttcagctct gaggtttttc ctattgactt ggaaaattgt tttgtgcctg 6540 tcgcttgccacatggccaat caagtaagct tattaatagt aatcaattac ggggtcatta 6600 gttcatagcccatatatgga gttccgcgtt acataactta cggtaaatgg cccgcctggc 6660 tgaccgcccaacgacccccg cccattgacg tcaataatga cgtatgttcc catagtaacg 6720 ccaatagggactttccattg acgtcaatgg gtggagtatt tacggtaaac tgcccacttg 6780 gcagtacatcaagtgtatca tatgccaagt acgcccccta ttgacgtcaa tgacggtaaa 6840 tggcccgcctggcattatgc ccagtacatg accttatggg actttcctac ttggcagtac 6900 atctacgtattagtcatcgc tattaccatg gtgatgcggt tttggcagta catcaatggg 6960 cgtggatagcggtttgactc acggggattt ccaagtctcc accccattga cgtcaatggg 7020 agtttgttttggcaccaaaa tcaacgggac tttccaaaat gtcgtaacaa ctccgcccca 7080 ttgacgcaaatgggcggtag gcgtgtacgg tgggaggtct atataagcag agctggttta 7140 gtgaaccgtcagatcggatc cgcctgagaa aggaagtgag ctgtaaaggc tgagctctct 7200 ctctgacgtatgtagcctct ggttagcttc gtcactcact gttcttgact cagcatggca 7260 atctgatgaaatcccagctg taagtctgca gaaattgatg atctattaaa caataaagat 7320 gtccactaaaatggaagttt ttcctgtcat actttgttaa gaagggtgag aacagagtac 7380 ctacattttgaatggaagga ttggagctac gggggtgggg gtggggtggg attagataaa 7440 tgcctgctctttactgaagg ctctttacta ttgctttatg ataatgtttc atagttggat 7500 atcataatttaaacaagcaa aaccaaatta agggccagct cattcctcca gatccactag 7560 taattctgtggaatgtgtgt cagttagggt gtggaaagtc cccaggctcc ccagcaggca 7620 gaagtatgcaaagcatgcat ctcaattagt cagcaaccag gtgtggaaag tccccaggct 7680 ccccagcaggcagaagtatg caaagcatgc atctcaatta gtcagcaacc atagtcccgc 7740 ccctaactccgcccatcccg cccctaactc cgcccagttc cgcccattct ccgccccatg 7800 gctgactaattttttttatt tatgcagagg ccgaggccgc ctctgcctct gagctattcc 7860 agaagtagtgaggaggcttt tttggaggcc taggcttttg caaaaagctc ccgggagctt 7920 gtatatccattttcggatct gatcaagaga caggatgagg atcgtttcgc atgattgaac 7980 aagatggattgcacgcaggt tctccggccg cttgggtgga gaggctattc ggctatgact 8040 gggcacaacagacaatcggc tgctctgatg ccgccgtgtt ccggctgtca gcgcaggggc 8100 gcccggttctttttgtcaag accgacctgt ccggtgccct gaatgaactg caggacgagg 8160 cagcgcggctatcstggctg gccacgacgg gcgttccttg cgcagctgtg ctcgacgttg 8220 tcactgaagcgggaagggac tggctgctat tgggcgaagt gccggggcag gatctcctgt 8280 catctcaccttgctcctgcc gagaaagtat ccatcatggc tgatgcaatg cggcggctgc 8340 atacgcttgatccggctacc tgcccattcg accaccaagc gaaacatcgc atcgagcgag 8400 cacgtactcggatggaagcc ggtcttgtcg atcaggatga tctggacgaa gagcatcagg 8460 ggctcgcgccagccgaactg ttcgccaggc tcaaggcgcg catgcccgac ggcgaggatc 8520 tcgtcgtgacccatggcgat gcctgcttgc cgaatatcat ggtggaaaat ggccgctttt 8580 ctggattcatcgactgtggc cggctgggtg tggcggaccg ctatcaggac atagcgttgg 8640 ctacccgtgatattgctgaa gagcttggcg gcgaatgggc tgaccgcttc ctcgtgcttt 8700 acggtatcgccgctcccgat tcgcagcgca tcgccttcta tcgccttctt gacgagttct 8760 tctgagcgggactctggggt tcgaaatgac cgaccaagcg acgcccaacc tgccatcacg 8820 agatttcgattccaccgccg ccttctatga aaggttgggc ttcggaatcg ttttccggga 8880 cgccggctggatgatcctcc agcgcgggga tctcatgctg gagttcttcg cccaccccaa 8940 cttgtttattgcagcttata atggttacaa ataaagcaat agcatcacaa atttcacaaa 9000 taaagcatttttttcactgc attctagttg tggtttgtcc aaactcatca atgtatctta 9060 tcatgtctgtataccgtcga gactagttct agagcggccg ccaccgcggt ggagctccag 9120 cttttgttccctttagtgag ggttaatttc gagcttggcg taatcatggt catagctgtt 9180 tcctgtgtgaaattgttatc cgctcacaat tccacacaac atacgagccg gaagcataaa 9240 gtgtaaagcctggggtgcct aatgagtgag ctaactcaca ttaattgcgt tgcgctcact 9300 gcccgctttccagtcgggaa acctgtcgtg ccagggggta cctaggccgg gcaacaattg 9360 gcggccggccgcacttttcg gggaaatgtg cgcggaaccc ctatttgttt atttttctaa 9420 atacattcaaatatgtatcc gctcatgaga caataaccct gataaatgct tcaataatat 9480 tgaaaaaggaagagtatgag tattcaacat ttccgtgtcg cccttattcc cttttttgcg 9540 gcattttgccttcctgtttt tgctcaccca gaaacgctgg tgaaagtaaa agatgctgaa 9600 gatcagttgggtgcacgagt gggttacatc gaactggatc tcaacagcgg taagatcctt 9660 gagagttttcgccccgaaga acgttttcca atgatgagca cttttaaagt tctgctatgt 9720 ggcgcggtattatcccgtat tgacgccggg caagagcaac tcggtcgccg catacactat 9780 tctcagaatgacttggttga gtactcacca gtcacagaaa agcatcttac ggatggcatg 9840 acagtaagagaattatgcag tgctgccata accatgagtg ataacactgc ggccaactta 9900 cttctgacaacgatcggagg accgaaggag ctaaccgctt ttttgcacaa catgggggat 9960 catgtaactcgccttgatcg ttgggaaccg gagctgaatg aagccatacc aaacgacgag 10020 cgtgacaccacgatgcctgt agcaatggca acaacgttgc gcaaactatt aactggcgaa 10080 ctacttactctagcttcccg gcaacaatta atagactgga tggaggcgga taaagttgca 10140 ggaccacttctgcgctcggc ccttccggct ggctggttta ttgctgataa atctggagcc 10200 ggtgagcgtgggtctcgcgg tatcattgca gcactggggc cagatggtaa gccctcccgt 10260 atcgtagttatctacacgac ggggagtcag gcaactatgg atgaacgaaa tagacagatc 10320 gctgagataggtgcctcact gattaagcat tggtaactgt cagaccctag gccgggcaac 10380 aattggcggccggccctgca ttaatgaatc ggccaacgcg cggggagagg cggtttgcgt 10440 attgggcgctcttccgcttc ctcgctcact gactcgctgc gctcggtcgt tcggctgcgg 10500 cgagcggtatcagctcactc aaaggcggta atacggttat ccacagaatc aggggataac 10560 gcaggaaagaacatgtgagc aaaaggccag caaaaggcca ggaaccgtaa aaaggccgcg 10620 ttgctggcgtttttccatag gctccgcccc cctgacgagc atcacaaaaa tcgacgctca 10680 agtcagaggtggcgaaaccc gacaggacta taaagatacc aggcgtttcc ccctggaagc 10740 tccctcgtgcgctctcctgt tccgaccctg ccgcttaccg gatacctgtc cgcctttctc 10800 ccttcgggaagcgtggcgct ttctcatagc tcacgctgta ggtatctcag ttcggtgtag 10860 gtcgttcgctccaagctggg ctgtgtgcac gaaccccccg ttcagcccga ccgctgcgcc 10920 ttatccggtaactatcgtct tgagtccaac ccggtaagac acgacttatc gccactggca 10980 gcagccactggtaacaggat tagcagagcg aggtatgtag gcggtgctac agagttcttg 11040 aagtggtggcctaactacgg ctacactaga aggacagtat ttggtatctg cgctctgctg 11100 aagccagttaccttcggaaa aagagttggt agctcttgat ccggcaaaca aaccaccgct 11160 ggtagcggtggtttttttgt ttgcaagcag cagattacgc gcagaaaaaa aggatctcaa 11220 gaagatcctttgatcttttc tacggggtct gacgctcagt ggaacgaaaa ctc 11273 7 12591 DNAArtificial Sequence Artificial Sequence containing human UCOE elementsand vector sequence 7 acgttgtaaa acgacggcca gtgaattgta atacgactcactatagggcg aattgggtac 60 cgggcccccc ctcgaggtcg agttggggtg gggaaaaggaagaaacgcgg gcgtattggc 120 cccaatgggg tctcggtggg gtatcgacag agtgccagccctgggaccga accccgcgtt 180 tatgaacaaa cgacccaaca cccgtgcgtt ttattctgtctttttattgc cgtcatagcg 240 cgggttcctt ccggtattgt ctccttccgt cgactctagacccgggctgc agcgaggagc 300 tctgcgttct acggtggtca gaccgaagac tgcgacggtaccgacgctgg tcgcgcctct 360 tatacccacg tagaacgcag ctcagccaat agaatgcgtgccaatatgga atttccaggg 420 gaaaaccggg gcggtgttac gttttggctg ccctttcacttcccattgac gtgtattggc 480 tcgagaacgg tactttccca ttaatcagct atgggaaagtaccgtttaaa ggtcacgttg 540 cattagtttc aatagcccat tgacgtcaat ggtgggaaagtacatggcgt tttaattaaa 600 ttggctggaa aaacccaatg actcacccct attgaccttatgtacgtgcc aataatggga 660 aaaacccatt gactcacccc ctattgacct tttgtactgggcaaaaccca atggaaagtc 720 cctattgact cagtgtactt ggctccaatg ggactttcctgttgattcac ccctattgac 780 cttatgtact gggcaaaacc cattggaaag tccctaatgactcagtatac gtgccagtaa 840 tgggaaaaac ccattggctt acctcccatt gaccttatgtactgggcaaa acccattgga 900 aagtccctat tgactcaatg tacttggctc caatgggactttcctgttga ctcaccccct 960 attgacctta tgtactgggc aaaacccaat ggaaagtccctattgagtca gtgtacttgg 1020 ctccaatggg tttttcccat tgactcatcc cctattgaccttatgtactg ggcaaaaccc 1080 aatggaaagt ccctattgac gcagtgtact tggctccaatgggactttcc tgttgattca 1140 ccccctattg accttatgta ctgggcagaa tacaatggaaagtccctatt gactcaccca 1200 cattgacctt atatgcttgc caacaatgga aaaacccattggaaagtccc tattgagtca 1260 gtgtacttgg cagcaatggg tttttcccat tggctcacctcccattgacc caatgtactt 1320 gggcaaaacc cagtggaaag tcccatttga ctcagtgtgcttgccagtaa tgggaaaaac 1380 ccattggctt acctcccatt gacccaatgt acttgggtaaaggccattga gtcaccaccc 1440 ctatgctggg aaatggtgaa cgccccctat gtggaaagtccctatgggcc gccattagag 1500 tgcatgaccg tgctgattca tatgccatat gagtgtattagggggctttc cgcttgggaa 1560 attgggtaaa aagtccccgt attactcaca tagggggcgtttggctttgc aaattagggg 1620 atttcagtgc atttggcatt aaaaactatt ggttctagtcataaaacggg cggagttggg 1680 cgagctcgaa ttcaaacgac tcgacggtat caaggtggcgaccggaatgg tgagctgcga 1740 gaatagccgg gcgcgctgtg agccgaagtc gcccccgccctggccacttc cggcgcgccg 1800 agtccttagg ccgccagggg gcgccggcgc gcgcccagattggggacaaa ggaagccggg 1860 ccggccgcgt tattaccata aaaggcaaac actggtcggaggcgtccccg cggcgcgcgg 1920 caggaagcca ggccccaacc ccctcccaac cgggcgccagccccgcctcc gcccggttca 1980 aacagcgacc gggtcgcgcg cgcgcacgca gcggccacaccctcgggcgc cagcggctcg 2040 ggcaggaagt ggcgcaagcg cccgggcccc agaacgcacgcgcgattagc gccattgagt 2100 cccagcgcgc acgcgcaatt agcgccaatt cccagcgcgcacgcagttag cgcccaaagg 2160 accagcgcgc acgcgcatgg cgccccagcc cccaccgggcctgacggggg ctacgccgcg 2220 cccaccgtgc gatccccatt ggcaagagcc cggctcagacaaagaccccg ccggttgccc 2280 ccgccccgag agcggcaccc ccggagcgcg cccgcccgagcgcggcctcg cgcctgcgaa 2340 ctggcgtggg gtgtccccca tctccggagg cccaggggcttctcccgcgc cccccacggc 2400 ggtccggttc cgccccatgc gccccccgct gcggcccagacggcggctct gcacgggcga 2460 agggccgcgg ccgcatgccc cggtcggctg gccgggcttacctggcggcg ggtgtggacg 2520 ggcggcggat cggcaaaggc gaggctctgt gctcgcgggcggacgcggtc tcggcggtgg 2580 tggcgcgtcg cgccgctggg ttttataggg cgccgccgcggccgctcgag ccataaaagg 2640 caactttcgg aacggcgcac gctgattggc cccgcgccgctcactcaccg gcttcgccgc 2700 acagtgcagc atttttttac cccctctccc ctccttttgcgaaaaaaaaa aagagcgaga 2760 gcgagattga ggaagaggag gagggagagt tttggcgttggccgccttgg ggtgctgggc 2820 ccgggggctg ggggcgcgcg ccgtggcccc cgcgccccacgctgggcagt gcccggttcg 2880 gccccgcatg gccaggcctg cccccggcct gcccgtctctcgggcccccc acccaccgcg 2940 ggacatccta ggtgtggaca tctcttgggc actgagcgcccaggtggggt gggccagggt 3000 ctgcacgggt gccagggccc tgggttctgt acgctcctgcagaaggagct cttggagggc 3060 atggagtggc caggcagtca ctcccccttg ccgacttcagagcaactgcc ctgaaagcag 3120 ggcctgagga cctctggctg tggggctcag ctagctaaatgtgctgggtg ggtcactagg 3180 gagagacctg ggcttgagag gtagagtgtg gtgttgggggagtcaggtgg cttgcggcca 3240 ttagagtcgc aggaccacac tccccaggac agggcaggggccagcggtcc agtggctgga 3300 ggtggcccgt gatgaaggct acaaacctac ccagccgcagccctgggaag gaagtgggct 3360 ctacagggca gggcaccttt taccctggag ctgcctgcttttgagggtaa cagtcacgcc 3420 cagccaagac caggcctggg gcgttagtgg gtgacctaggcactgcgggg cgggggggct 3480 gggtctacac agcctgggtc tgggcccacc gtccgttgtatgtctgctat gcgcagccac 3540 agctgaactg ccctcccaga ccatctggag gccgctgggggactctgggg accaagactc 3600 catgtgccac agaggattgg gggcggggcg gtgctaggaactcaaagcca gcctgggaag 3660 accctgtcct tgtcaccctt tcttgccttg ggtctgtccactgagtagca cacaagaccg 3720 ggtgggcagg gtccgttctg ctccgggaat cacagactgtgtgtacccag gtggtgggca 3780 tgcagcgatc agtggcgtgg gaccacagag ggggcccgcggtacctaaaa cagcttcaca 3840 tggcttaaaa taggggacca atgtcttttc caatctaagtcccatttata ataaagtcca 3900 tgttccattt ttaaaggaca atcctttcgg tttaaaaccaggcacgatta cccaaacaac 3960 tcacaacggt aaagcactgt gaatcttctc tgttctgcaatcccaacttg gtttctgctc 4020 agaaaccctc cctctttcca atcggtaatt aaataacaaaaggaaaaaac ttaagatgct 4080 tcaaccccgt ttcgtgacac tttgaaaaaa gaatcacctcttgcaaacac ccgctcccga 4140 cccccgccgc tgaagcccgg cgtccagagg cctaagcgcgggtgcccgcc cccacccggg 4200 agcgcgggcc tcgtggtcag cgcatccgcg gggagaaacaaaggccgcgg cacgggggct 4260 caagggcact gcgccacacc gcacgcgcct acccccgcgcggccacgtta actggcggtc 4320 gccgcagcct cgggacagcc ggccgcgcgc cgccaggctcgcggacgcgg gaccacgcgc 4380 cgccctccgg gaggcccaag tctcgaccca gccccgcgtggcgctggggg agggggcgcc 4440 tccgccggaa cgcgggtggg ggaggggagg gggaaatgcgctttgtctcg aaatggggca 4500 accgtcgcca cagctcccta ccccctcgag ggcagagcagtccccccact aactaccggg 4560 ctggccgcgc gccaggccag ccgcgaggcc accgcccgaccctccactcc ttcccgcagc 4620 tcccggcgcg gggtccggcg agaaggggag gggaggggagcggagaaccg ggcccccggg 4680 acgcgtgtgg catctgaagc accaccagcg agcgagagctagagagaagg aaagccaccg 4740 acttcaccgc ctccgagctg ctccgggtcg cgggtctgcagcgtctccgg ccctccgcgc 4800 ctacagctca agccacatcc gaagggggag ggagccgggagctgcgcgcg gggccgccgg 4860 ggggaggggt ggcaccgccc acgccgggcg gccacgaagggcggggcagc gggcgcgcgc 4920 gcggcggggg gaggggccgg cgccgcgccc gctgggaattggggccctag ggggagggcg 4980 gaggcgccga cgaccgcggc acttaccgtt cgcggcgtggcgcccggtgg tccccaaggg 5040 gagggaaggg ggaggcgggg cgaggacagt gaccggagtctcctcagcgg tggcttttct 5100 gcttggcagc ctcagcggct ggcgccaaaa ccggactccgcccacttcct cgcccgccgg 5160 tgcgagggtg tggaatcctc cagacgctgg gggagggggagttgggagct taaaaactag 5220 tacccctttg ggaccacttt cagcagcgaa ctctcctgtacaccaggggt cagttccaca 5280 gacgcgggcc aggggtgggt cattgcggcg tgaacaataatttgactaga agttgattcg 5340 ggtgtttccg gaaggggccg agtcaatccg ccgagttggggcacggaaaa caaaaaggga 5400 aggctactaa gatttttctg gcgggggtta tcattggcgtaactgcaggg accacctccc 5460 gggttgaggg ggctggatct ccaggctgcg gattaagcccctcccgtcgg cgttaatttc 5520 aaactgcgcg acgtttctca cctgccttcg ccaaggcaggggccgggacc ctattccaag 5580 aggtagtaac tagcaggact ctagccttcc gcaattcattgagcgcattt acggaagtaa 5640 cgtcgggtac tgtctctggc cgcaagggtg ggaggagtacgcatttggcg taaggtgggg 5700 cgtagagcct tcccgccatt ggcggcggat agggcgtttacgcgacggcc tgacgtagcg 5760 gaagacgcgt tagtgggggg gaaggttcta gaaaagcggcggcagcggct ctagcggcag 5820 tagcagcagc gccgggtccc gtgcggaggt gctcctcgcagagttgtttc tcgagcagcg 5880 gcagttctca ctacagcgcc aggacgagtc cggttcgtgttcgtccgcgg agatctctct 5940 catctcgctc ggctgcggga aatcgggctg aagcgactgagtccgcgatg gaggtaacgg 6000 gtttgaaatc aatgagttat tgaaaagggc atggcgaggccgttggcgcc tcagtggaag 6060 tcggccagcc gcctccgtgg gagagaggca ggaaatcggaccaattcagt agcagtgggg 6120 cttaaggttt atgaacgggg tcttgagcgg aggcctgagcgtacaaacag cttccccacc 6180 ctcagcctcc cggcgccatt tcccttcact gggggtgggggatggggagc tttcacatgg 6240 cggacgctgc cccgctgggg tgaaagtggg gcgcggaggcgggaattctt attccctttc 6300 taaagcacgc tgcttcgggg gccacggcgt ctcctcggcgagcgtttcgg cgggcagcag 6360 gtcctcgtga gcgaggctgc ggagcttccc ctccccctctctcccgggaa ccgatttggc 6420 ggccgccatt ttcatggctc gccttcctct cagcgttttccttataactc ttttattttc 6480 ttagtgtgct ttctctatca agaagtagaa gtggttaactattttttttt tcttctcggg 6540 ctgttttcat atcgtttcga ggtggatttg gagtgttttgtgagcttgga tctttagagt 6600 cctgcgcacc tcattaaagg cgctcagcct tcccctcgatgaaatggcgc cattgcgttc 6660 ggaagccaca ccgaagagcg gggagggggg gtgctccgggtttgcgggcc cggtttcaga 6720 gaagatatca ccacccaggg cgtcgggccg ggttcaatgcgagccgtagg acaaagaaac 6780 cattttatgt ttttcctgtc ttttttttcc tttgagtaacggttttatct gggtctgcag 6840 tcagtaaaac gacagatgaa ccgcggcaaa ataaacataaattggaagcc atcggccacg 6900 aggggcaggg acgaaggtgg ttttctgggc gggggagggatattcgcgtc agaatccttt 6960 actgttctta aggattccgt ttaagttgta gagctgactcattttaagta atgttgttac 7020 tgagaagttt aacccttacg ggacagatcc atggacctttatagatgatt acgaggaaag 7080 tgaaataacg attttgtcct tagttatact tcgattaaaacatggcttca gaggctcctt 7140 cctgtaatgc gtatggattg atgtgcaaaa ctgttttgggcctgggccgc tctgtatttg 7200 aactttgtta cttttctcat tttgtttgca atcttggttgaacattacat tgataagcat 7260 aaggtctcaa gcgaaggggg tctacctggt tatttttctttgaccctaag cacgtttata 7320 aaataacatt gtttaaaatc gatagtggac atcgggtaagtttggataaa ttgtgaggta 7380 agtaatgagt ttttgctttt tgttagtgat ttgtaaaacttgttataaat gtacattatc 7440 cgtaatttca gtttagagat aacctatgtg ctgacgacaattaagaataa aaactagctg 7500 aaaaaatgaa aataactatc gtgacaagta accatttcaaaagactgctt tgtgtctcat 7560 aggagctagt ttgatcattt cagttaattt tttctttaatttttacgagt catgaaaact 7620 acaggaaaaa aaatctgaac tgggttttac cactactttttaggagttgg gagcatgcga 7680 atggagggag agctccgtag aactgggatg agagcagcaattaatgctgc ttgctaggaa 7740 caaaaaataa ttgattgaaa attacgtgtg actttttagtttgcattatg cgtttgtagc 7800 agttggtcct ggatatcact ttctctcgtt tgaggttttttaacctagtt aacttttaag 7860 acaggtttcc ttaacattca taagtgccca gaatacagctgtgtagtaca gcatataaag 7920 atttcagctc tgaggttttt cctattgact tggaaaattgttttgtgcct gtcgcttgcc 7980 acatggccaa tcaagtaagc ttattaatag taatcaattacggggtcatt agttcatagc 8040 ccatatatgg agttccgcgt tacataactt acggtaaatggcccgcctgg ctgaccgccc 8100 aacgaccccc gcccattgac gtcaataatg acgtatgttcccatagtaac gccaataggg 8160 actttccatt gacgtcaatg ggtggagtat ttacggtaaactgcccactt ggcagtacat 8220 caagtgtatc atatgccaag tacgccccct attgacgtcaatgacggtaa atggcccgcc 8280 tggcattatg cccagtacat gaccttatgg gactttcctacttggcagta catctacgta 8340 ttagtcatcg ctattaccat ggtgatgcgg ttttggcagtacatcaatgg gcgtggatag 8400 cggtttgact cacggggatt tccaagtctc caccccattgacgtcaatgg gagtttgttt 8460 tggcaccaaa atcaacggga ctttccaaaa tgtcgtaacaactccgcccc attgacgcaa 8520 atgggcggta ggcgtgtacg gtgggaggtc tatataagcagagctggttt agtgaaccgt 8580 cagatcggat ccgcctgaga aaggaagtga gctgtaaaggctgagctctc tctctgacgt 8640 atgtagcctc tggttagctt cgtcactcac tgttcttgactcagcatggc aatctgatga 8700 aatcccagct gtaagtctgc agaaattgat gatctattaaacaataaaga tgtccactaa 8760 aatggaagtt tttcctgtca tactttgtta agaagggtgagaacagagta cctacatttt 8820 gaatggaagg attggagcta cgggggtggg ggtggggtgggattagataa atgcctgctc 8880 tttactgaag gctctttact attgctttat gataatgtttcatagttgga tatcataatt 8940 taaacaagca aaaccaaatt aagggccagc tcattcctccagatccacta gttctagagc 9000 aaattctacc gggtagggga ggcgcttttc ccaaggcagtctggagcatg cgctttagca 9060 gccccgctgg gcacttggcg ctacacaagt ggcctctggcctcgcacaca ttccacatcc 9120 accggtaggc gccaaccggc tccgttcttt ggtggccccttcgcgccacc ttctactcct 9180 cccctagtca ggaagttccc ccccgccccg cagctcgcgtcgtgcaggac gtgacaaatg 9240 gaagtagcac gtctcactag tctcgtgcag atggacagcaccgctgagca atggaagcgg 9300 gtaggccttt ggggcagcgg ccaatagcag ctttgctccttcgctttctg ggctcagagg 9360 ctgggaaggg gtgggtccgg gggcgggctc aggggcgggctcaggggcgg ggcgggcgcc 9420 cgaaggtcct ccggaggccc ggcattctgc acgcttcaaaagcgcacgtc tgccgcgctg 9480 ttctcctctt cctcatctcc gggcctttcg accagcttaccatgaccgag tacaagccca 9540 cggtgcgcct cgccacccgc gacgacgtcc ccagggccgtacgcaccctc gccgccgcgt 9600 tcgccgacta ccccgccacg cgccacaccg tcgatccggaccgccacatc gagcgggtca 9660 ccgagctgca agaactcttc ctcacgcgcg tcgggctcgacatcggcaag gtgtgggtcg 9720 cggacgacgg cgccgcggtg gcggtctgga ccacgccggagagcgtcgaa gcgggggcgg 9780 tgttcgccga gatcggcccg cgcatggccg agttgagcggttcccggctg gccgcgcaga 9840 acagatggaa ggcctcctgg cgccgcaccg gcccaaggagcccgcgtggt tcctggccac 9900 cgtcgcgtct cgcccgacca ccagggcaag ggtctgggcagcgccgtcgt gctccccgga 9960 gtggaggcgg ccgagcgcgc cggggtgccc gccttcctggagacctccgc gccccgcaac 10020 ctccccttct acgagcggct cggcttcacc gtcaccgccgacgtcgaggt gcccgaagga 10080 ccgcgcacct ggtgcatgac ccgcaagccc ggtgcctgacgcccgcccca cgacccgcag 10140 cgcccgaccg aaaggagcgc acgaccccat gcataggttgggcttcggaa tcgttttccg 10200 ggacgccggc tggatgatcc tccagcgcgg ggatctcatgctggagttct tcgcccaccc 10260 caacttgttt attgcagctt ataatggtta caaataaagcaatagcatca caaatttcac 10320 aaataaagca tttttttcac tgcattctag ttgtggtttgtccaaactca tcaatgtatc 10380 ttatcatgtc tgtataccgt cgagatctag agcggccgccaccgcggtgg agctccagct 10440 tttgttccct ttagtgaggg ttaatttcga gcttggcgtaatcatggtca tagctgtttc 10500 ctgtgtgaaa ttgttatccg ctcacaattc cacacaacatacgagccgga agcataaagt 10560 gtaaagcctg gggtgcctaa tgagtgagct aactcacattaattgcgttg cgctcactgc 10620 ccgctttcca gtcgggaaac ctgtcgtgcc agggggtacctaggccgggc aacaattggc 10680 ggccggccgc acttttcggg gaaatgtgcg cggaacccctatttgtttat ttttctaaat 10740 acattcaaat atgtatccgc tcatgagaca ataaccctgataaatgcttc aataatattg 10800 aaaaaggaag agtatgagta ttcaacattt ccgtgtcgcccttattccct tttttgcggc 10860 attttgcctt cctgtttttg ctcacccaga aacgctggtgaaagtaaaag atgctgaaga 10920 tcagttgggt gcacgagtgg gttacatcga actggatctcaacagcggta agatccttga 10980 gagttttcgc cccgaagaac gttttccaat gatgagcacttttaaagttc tgctatgtgg 11040 cgcggtatta tcccgtattg acgccgggca agagcaactcggtcgccgca tacactattc 11100 tcagaatgac ttggttgagt actcaccagt cacagaaaagcatcttacgg atggcatgac 11160 agtaagagaa ttatgcagtg ctgccataac catgagtgataacactgcgg ccaacttact 11220 tctgacaacg atcggaggac cgaaggagct aaccgcttttttgcacaaca tgggggatca 11280 tgtaactcgc cttgatcgtt gggaaccgga gctgaatgaagccataccaa acgacgagcg 11340 tgacaccacg atgcctgtag caatggcaac aacgttgcgcaaactattaa ctggcgaact 11400 acttactcta gcttcccggc aacaattaat agactggatggaggcggata aagttgcagg 11460 accacttctg cgctcggccc ttccggctgg ctggtttattgctgataaat ctggagccgg 11520 tgagcgtggg tctcgcggta tcattgcagc actggggccagatggtaagc cctcccgtat 11580 cgtagttatc tacacgacgg ggagtcaggc aactatggatgaacgaaata gacagatcgc 11640 tgagataggt gcctcactga ttaagcattg gtaactgtcagaccctaggc cgggcaacaa 11700 ttggcggccg gccctgcatt aatgaatcgg ccaacgcgcggggagaggcg gtttgcgtat 11760 tgggcgctct tccgcttcct cgctcactga ctcgctgcgctcggtcgttc ggctgcggcg 11820 agcggtatca gctcactcaa aggcggtaat acggttatccacagaatcag gggataacgc 11880 aggaaagaac atgtgagcaa aaggccagca aaaggccaggaaccgtaaaa aggccgcgtt 11940 gctggcgttt ttccataggc tccgcccccc tgacgagcatcacaaaaatc gacgctcaag 12000 tcagaggtgg cgaaacccga caggactata aagataccaggcgtttcccc ctggaagctc 12060 cctcgtgcgc tctcctgttc cgaccctgcc gcttaccggatacctgtccg cctttctccc 12120 ttcgggaagc gtggcgcttt ctcatagctc acgctgtaggtatctcagtt cggtgtaggt 12180 cgttcgctcc aagctgggct gtgtgcacga accccccgttcagcccgacc gctgcgcctt 12240 atccggtaac tatcgtcttg agtccaaccc ggtaagacacgacttatcgc cactggcagc 12300 agccactggt aacaggatta gcagagcgag gtatgtaggcggtgctacag agttcttgaa 12360 gtggtggcct aactacggct acactagaag gacagtatttggtatctgcg ctctgctgaa 12420 gccagttacc ttcggaaaaa gagttggtag ctcttgatccggcaaacaaa ccaccgctgg 12480 tagcggtggt ttttttgttt gcaagcagca gattacgcgcagaaaaaaag gatctcaaga 12540 agatcctttg atcttttcta cggggtctga cgctcagtggaacgaaaact c 12591 8 11160 DNA Artificial Sequence Artificial Sequencecontaining human UCOE elements and vector sequence 8 acgttgtaaaacgacggcca gtgaattgta atacgactca ctatagggcg aattgggtac 60 cgggccccccctcgaggtcg agttggggtg gggaaaagga agaaacgcgg gcgtattggc 120 cccaatggggtctcggtggg gtatcgacag agtgccagcc ctgggaccga accccgcgtt 180 tatgaacaaacgacccaaca cccgtgcgtt ttattctgtc tttttattgc cgtcatagcg 240 cgggttccttccggtattgt ctccttccgt cgacggtatc aaggtggcga ccggaatggt 300 gagctgcgagaatagccggg cgcgctgtga gccgaagtcg cccccgccct ggccacttcc 360 ggcgcgccgagtccttaggc cgccaggggg cgccggcgcg cgcccagatt ggggacaaag 420 gaagccgggccggccgcgtt attaccataa aaggcaaaca ctggtcggag gcgtccccgc 480 ggcgcgcggcaggaagccag gccccaaccc cctcccaacc gggcgccagc cccgcctccg 540 cccggttcaaacagcgaccg ggtcgcgcgc gcgcacgcag cggccacacc ctcgggcgcc 600 agcggctcgggcaggaagtg gcgcaagcgc ccgggcccca gaacgcacgc gcgattagcg 660 ccattgagtcccagcgcgca cgcgcaatta gcgccaattc ccagcgcgca cgcagttagc 720 gcccaaaggaccagcgcgca cgcgcatggc gccccagccc ccaccgggcc tgacgggggc 780 tacgccgcgcccaccgtgcg atccccattg gcaagagccc ggctcagaca aagaccccgc 840 cggttgcccccgccccgaga gcggcacccc cggagcgcgc ccgcccgagc gcggcctcgc 900 gcctgcgaactggcgtgggg tgtcccccat ctccggaggc ccaggggctt ctcccgcgcc 960 ccccacggcggtccggttcc gccccatgcg ccccccgctg cggcccagac ggcggctctg 1020 cacgggcgaagggccgcggc cgcatgcccc ggtcggctgg ccgggcttac ctggcggcgg 1080 gtgtggacgggcggcggatc ggcaaaggcg aggctctgtg ctcgcgggcg gacgcggtct 1140 cggcggtggtggcgcgtcgc gccgctgggt tttatagggc gccgccgcgg ccgctcgagc 1200 cataaaaggcaactttcgga acggcgcacg ctgattggcc ccgcgccgct cactcaccgg 1260 cttcgccgcacagtgcagca tttttttacc ccctctcccc tccttttgcg aaaaaaaaaa 1320 agagcgagagcgagattgag gaagaggagg agggagagtt ttggcgttgg ccgccttggg 1380 gtgctgggcccgggggctgg gggcgcgcgc cgtggccccc gcgccccacg ctgggcagtg 1440 cccggttcggccccgcatgg ccaggcctgc ccccggcctg cccgtctctc gggcccccca 1500 cccaccgcgggacatcctag gtgtggacat ctcttgggca ctgagcgccc aggtggggtg 1560 ggccagggtctgcacgggtg ccagggccct gggttctgta cgctcctgca gaaggagctc 1620 ttggagggcatggagtggcc aggcagtcac tcccccttgc cgacttcaga gcaactgccc 1680 tgaaagcagggcctgaggac ctctggctgt ggggctcagc tagctaaatg tgctgggtgg 1740 gtcactagggagagacctgg gcttgagagg tagagtgtgg tgttggggga gtcaggtggc 1800 ttgcggccattagagtcgca ggaccacact ccccaggaca gggcaggggc cagcggtcca 1860 gtggctggaggtggcccgtg atgaaggcta caaacctacc cagccgcagc cctgggaagg 1920 aagtgggctctacagggcag ggcacctttt accctggagc tgcctgcttt tgagggtaac 1980 agtcacgcccagccaagacc aggcctgggg cgttagtggg tgacctaggc actgcggggc 2040 gggggggctgggtctacaca gcctgggtct gggcccaccg tccgttgtat gtctgctatg 2100 cgcagccacagctgaactgc cctcccagac catctggagg ccgctggggg actctgggga 2160 ccaagactccatgtgccaca gaggattggg ggcggggcgg tgctaggaac tcaaagccag 2220 cctgggaagaccctgtcctt gtcacccttt cttgccttgg gtctgtccac tgagtagcac 2280 acaagaccgggtgggcaggg tccgttctgc tccgggaatc acagactgtg tgtacccagg 2340 tggtgggcatgcagcgatca gtggcgtggg accacagagg gggcccgcgg tacctaaaac 2400 agcttcacatggcttaaaat aggggaccaa tgtcttttcc aatctaagtc ccatttataa 2460 taaagtccatgttccatttt taaaggacaa tcctttcggt ttaaaaccag gcacgattac 2520 ccaaacaactcacaacggta aagcactgtg aatcttctct gttctgcaat cccaacttgg 2580 tttctgctcagaaaccctcc ctctttccaa tcggtaatta aataacaaaa ggaaaaaact 2640 taagatgcttcaaccccgtt tcgtgacact ttgaaaaaag aatcacctct tgcaaacacc 2700 cgctcccgacccccgccgct gaagcccggc gtccagaggc ctaagcgcgg gtgcccgccc 2760 ccacccgggagcgcgggcct cgtggtcagc gcatccgcgg ggagaaacaa aggccgcggc 2820 acgggggctcaagggcactg cgccacaccg cacgcgccta cccccgcgcg gccacgttaa 2880 ctggcggtcgccgcagcctc gggacagccg gccgcgcgcc gccaggctcg cggacgcggg 2940 accacgcgccgccctccggg aggcccaagt ctcgacccag ccccgcgtgg cgctggggga 3000 gggggcgcctccgccggaac gcgggtgggg gaggggaggg ggaaatgcgc tttgtctcga 3060 aatggggcaaccgtcgccac agctccctac cccctcgagg gcagagcagt ccccccacta 3120 actaccgggctggccgcgcg ccaggccagc cgcgaggcca ccgcccgacc ctccactcct 3180 tcccgcagctcccggcgcgg ggtccggcga gaaggggagg ggaggggagc ggagaaccgg 3240 gcccccgggacgcgtgtggc atctgaagca ccaccagcga gcgagagcta gagagaagga 3300 aagccaccgacttcaccgcc tccgagctgc tccgggtcgc gggtctgcag cgtctccggc 3360 cctccgcgcctacagctcaa gccacatccg aagggggagg gagccgggag ctgcgcgcgg 3420 ggccgccggggggaggggtg gcaccgccca cgccgggcgg ccacgaaggg cggggcagcg 3480 ggcgcgcgcgcggcgggggg aggggccggc gccgcgcccg ctgggaattg gggccctagg 3540 gggagggcggaggcgccgac gaccgcggca cttaccgttc gcggcgtggc gcccggtggt 3600 ccccaaggggagggaagggg gaggcggggc gaggacagtg accggagtct cctcagcggt 3660 ggcttttctgcttggcagcc tcagcggctg gcgccaaaac cggactccgc ccacttcctc 3720 gcccgccggtgcgagggtgt ggaatcctcc agacgctggg ggagggggag ttgggagctt 3780 aaaaactagtacccctttgg gaccactttc agcagcgaac tctcctgtac accaggggtc 3840 agttccacagacgcgggcca ggggtgggtc attgcggcgt gaacaataat ttgactagaa 3900 gttgattcgggtgtttccgg aaggggccga gtcaatccgc cgagttgggg cacggaaaac 3960 aaaaagggaaggctactaag atttttctgg cgggggttat cattggcgta actgcaggga 4020 ccacctcccgggttgagggg gctggatctc caggctgcgg attaagcccc tcccgtcggc 4080 gttaatttcaaactgcgcga cgtttctcac ctgccttcgc caaggcaggg gccgggaccc 4140 tattccaagaggtagtaact agcaggactc tagccttccg caattcattg agcgcattta 4200 cggaagtaacgtcgggtact gtctctggcc gcaagggtgg gaggagtacg catttggcgt 4260 aaggtggggcgtagagcctt cccgccattg gcggcggata gggcgtttac gcgacggcct 4320 gacgtagcggaagacgcgtt agtggggggg aaggttctag aaaagcggcg gcagcggctc 4380 tagcggcagtagcagcagcg ccgggtcccg tgcggaggtg ctcctcgcag agttgtttct 4440 cgagcagcggcagttctcac tacagcgcca ggacgagtcc ggttcgtgtt cgtccgcgga 4500 gatctctctcatctcgctcg gctgcgggaa atcgggctga agcgactgag tccgcgatgg 4560 aggtaacgggtttgaaatca atgagttatt gaaaagggca tggcgaggcc gttggcgcct 4620 cagtggaagtcggccagccg cctccgtggg agagaggcag gaaatcggac caattcagta 4680 gcagtggggcttaaggttta tgaacggggt cttgagcgga ggcctgagcg tacaaacagc 4740 ttccccaccctcagcctccc ggcgccattt cccttcactg ggggtggggg atggggagct 4800 ttcacatggcggacgctgcc ccgctggggt gaaagtgggg cgcggaggcg ggaattctta 4860 ttccctttctaaagcacgct gcttcggggg ccacggcgtc tcctcggcga gcgtttcggc 4920 gggcagcaggtcctcgtgag cgaggctgcg gagcttcccc tccccctctc tcccgggaac 4980 cgatttggcggccgccattt tcatggctcg ccttcctctc agcgttttcc ttataactct 5040 tttattttcttagtgtgctt tctctatcaa gaagtagaag tggttaacta tttttttttt 5100 cttctcgggctgttttcata tcgtttcgag gtggatttgg agtgttttgt gagcttggat 5160 ctttagagtcctgcgcacct cattaaaggc gctcagcctt cccctcgatg aaatggcgcc 5220 attgcgttcggaagccacac cgaagagcgg ggaggggggg tgctccgggt ttgcgggccc 5280 ggtttcagagaagatatcac cacccagggc gtcgggccgg gttcaatgcg agccgtagga 5340 caaagaaaccattttatgtt tttcctgtct tttttttcct ttgagtaacg gttttatctg 5400 ggtctgcagtcagtaaaacg acagatgaac cgcggcaaaa taaacataaa ttggaagcca 5460 tcggccacgaggggcaggga cgaaggtggt tttctgggcg ggggagggat attcgcgtca 5520 gaatcctttactgttcttaa ggattccgtt taagttgtag agctgactca ttttaagtaa 5580 tgttgttactgagaagttta acccttacgg gacagatcca tggaccttta tagatgatta 5640 cgaggaaagtgaaataacga ttttgtcctt agttatactt cgattaaaac atggcttcag 5700 aggctccttcctgtaatgcg tatggattga tgtgcaaaac tgttttgggc ctgggccgct 5760 ctgtatttgaactttgttac ttttctcatt ttgtttgcaa tcttggttga acattacatt 5820 gataagcataaggtctcaag cgaagggggt ctacctggtt atttttcttt gaccctaagc 5880 acgtttataaaataacattg tttaaaatcg atagtggaca tcgggtaagt ttggataaat 5940 tgtgaggtaagtaatgagtt tttgcttttt gttagtgatt tgtaaaactt gttataaatg 6000 tacattatccgtaatttcag tttagagata acctatgtgc tgacgacaat taagaataaa 6060 aactagctgaaaaaatgaaa ataactatcg tgacaagtaa ccatttcaaa agactgcttt 6120 gtgtctcataggagctagtt tgatcatttc agttaatttt ttctttaatt tttacgagtc 6180 atgaaaactacaggaaaaaa aatctgaact gggttttacc actacttttt aggagttggg 6240 agcatgcgaatggagggaga gctccgtaga actgggatga gagcagcaat taatgctgct 6300 tgctaggaacaaaaaataat tgattgaaaa ttacgtgtga ctttttagtt tgcattatgc 6360 gtttgtagcagttggtcctg gatatcactt tctctcgttt gaggtttttt aacctagtta 6420 acttttaagacaggtttcct taacattcat aagtgcccag aatacagctg tgtagtacag 6480 catataaagatttcagctct gaggtttttc ctattgactt ggaaaattgt tttgtgcctg 6540 tcgcttgccacatggccaat caagtaagct tattaatagt aatcaattac ggggtcatta 6600 gttcatagcccatatatgga gttccgcgtt acataactta cggtaaatgg cccgcctggc 6660 tgaccgcccaacgacccccg cccattgacg tcaataatga cgtatgttcc catagtaacg 6720 ccaatagggactttccattg acgtcaatgg gtggagtatt tacggtaaac tgcccacttg 6780 gcagtacatcaagtgtatca tatgccaagt acgcccccta ttgacgtcaa tgacggtaaa 6840 tggcccgcctggcattatgc ccagtacatg accttatggg actttcctac ttggcagtac 6900 atctacgtattagtcatcgc tattaccatg gtgatgcggt tttggcagta catcaatggg 6960 cgtggatagcggtttgactc acggggattt ccaagtctcc accccattga cgtcaatggg 7020 agtttgttttggcaccaaaa tcaacgggac tttccaaaat gtcgtaacaa ctccgcccca 7080 ttgacgcaaatgggcggtag gcgtgtacgg tgggaggtct atataagcag agctggttta 7140 gtgaaccgtcagatcggatc cgcctgagaa aggaagtgag ctgtaaaggc tgagctctct 7200 ctctgacgtatgtagcctct ggttagcttc gtcactcact gttcttgact cagcatggca 7260 atctgatgaaatcccagctg taagtctgca gaaattgatg atctattaaa caataaagat 7320 gtccactaaaatggaagttt ttcctgtcat actttgttaa gaagggtgag aacagagtac 7380 ctacattttgaatggaagga ttggagctac gggggtgggg gtggggtggg attagataaa 7440 tgcctgctctttactgaagg ctctttacta ttgctttatg ataatgtttc atagttggat 7500 atcataatttaaacaagcaa aaccaaatta agggccagct cattcctcca gatccactag 7560 ttctagagcaaattctaccg ggtaggggag gcgcttttcc caaggcagtc tggagcatgc 7620 gctttagcagccccgctggg cacttggcgc tacacaagtg gcctctggcc tcgcacacat 7680 tccacatccaccggtaggcg ccaaccggct ccgttctttg gtggcccctt cgcgccacct 7740 tctactcctcccctagtcag gaagttcccc cccgccccgc agctcgcgtc gtgcaggacg 7800 tgacaaatggaagtagcacg tctcactagt ctcgtgcaga tggacagcac cgctgagcaa 7860 tggaagcgggtaggcctttg gggcagcggc caatagcagc tttgctcctt cgctttctgg 7920 gctcagaggctgggaagggg tgggtccggg ggcgggctca ggggcgggct caggggcggg 7980 gcgggcgcccgaaggtcctc cggaggcccg gcattctgca cgcttcaaaa gcgcacgtct 8040 gccgcgctgttctcctcttc ctcatctccg ggcctttcga ccagcttacc atgaccgagt 8100 acaagcccacggtgcgcctc gccacccgcg acgacgtccc cagggccgta cgcaccctcg 8160 ccgccgcgttcgccgactac cccgccacgc gccacaccgt cgatccggac cgccacatcg 8220 agcgggtcaccgagctgcaa gaactcttcc tcacgcgcgt cgggctcgac atcggcaagg 8280 tgtgggtcgcggacgacggc gccgcggtgg cggtctggac cacgccggag agcgtcgaag 8340 cgggggcggtgttcgccgag atcggcccgc gcatggccga gttgagcggt tcccggctgg 8400 ccgcgcagaacagatggaag gcctcctggc gccgcaccgg cccaaggagc ccgcgtggtt 8460 cctggccaccgtcgcgtctc gcccgaccac cagggcaagg gtctgggcag cgccgtcgtg 8520 ctccccggagtggaggcggc cgagcgcgcc ggggtgcccg ccttcctgga gacctccgcg 8580 ccccgcaacctccccttcta cgagcggctc ggcttcaccg tcaccgccga cgtcgaggtg 8640 cccgaaggaccgcgcacctg gtgcatgacc cgcaagcccg gtgcctgacg cccgccccac 8700 gacccgcagcgcccgaccga aaggagcgca cgaccccatg cataggttgg gcttcggaat 8760 cgttttccgggacgccggct ggatgatcct ccagcgcggg gatctcatgc tggagttctt 8820 cgcccaccccaacttgttta ttgcagctta taatggttac aaataaagca atagcatcac 8880 aaatttcacaaataaagcat ttttttcact gcattctagt tgtggtttgt ccaaactcat 8940 caatgtatcttatcatgtct gtataccgtc gagatctaga gcggccgcca ccgcggtgga 9000 gctccagcttttgttccctt tagtgagggt taatttcgag cttggcgtaa tcatggtcat 9060 agctgtttcctgtgtgaaat tgttatccgc tcacaattcc acacaacata cgagccggaa 9120 gcataaagtgtaaagcctgg ggtgcctaat gagtgagcta actcacatta attgcgttgc 9180 gctcactgcccgctttccag tcgggaaacc tgtcgtgcca gggggtacct aggccgggca 9240 acaattggcggccggccgca cttttcgggg aaatgtgcgc ggaaccccta tttgtttatt 9300 tttctaaatacattcaaata tgtatccgct catgagacaa taaccctgat aaatgcttca 9360 ataatattgaaaaaggaaga gtatgagtat tcaacatttc cgtgtcgccc ttattccctt 9420 ttttgcggcattttgccttc ctgtttttgc tcacccagaa acgctggtga aagtaaaaga 9480 tgctgaagatcagttgggtg cacgagtggg ttacatcgaa ctggatctca acagcggtaa 9540 gatccttgagagttttcgcc ccgaagaacg ttttccaatg atgagcactt ttaaagttct 9600 gctatgtggcgcggtattat cccgtattga cgccgggcaa gagcaactcg gtcgccgcat 9660 acactattctcagaatgact tggttgagta ctcaccagtc acagaaaagc atcttacgga 9720 tggcatgacagtaagagaat tatgcagtgc tgccataacc atgagtgata acactgcggc 9780 caacttacttctgacaacga tcggaggacc gaaggagcta accgcttttt tgcacaacat 9840 gggggatcatgtaactcgcc ttgatcgttg ggaaccggag ctgaatgaag ccataccaaa 9900 cgacgagcgtgacaccacga tgcctgtagc aatggcaaca acgttgcgca aactattaac 9960 tggcgaactacttactctag cttcccggca acaattaata gactggatgg aggcggataa 10020 agttgcaggaccacttctgc gctcggccct tccggctggc tggtttattg ctgataaatc 10080 tggagccggtgagcgtgggt ctcgcggtat cattgcagca ctggggccag atggtaagcc 10140 ctcccgtatcgtagttatct acacgacggg gagtcaggca actatggatg aacgaaatag 10200 acagatcgctgagataggtg cctcactgat taagcattgg taactgtcag accctaggcc 10260 gggcaacaattggcggccgg ccctgcatta atgaatcggc caacgcgcgg ggagaggcgg 10320 tttgcgtattgggcgctctt ccgcttcctc gctcactgac tcgctgcgct cggtcgttcg 10380 gctgcggcgagcggtatcag ctcactcaaa ggcggtaata cggttatcca cagaatcagg 10440 ggataacgcaggaaagaaca tgtgagcaaa aggccagcaa aaggccagga accgtaaaaa 10500 ggccgcgttgctggcgtttt tccataggct ccgcccccct gacgagcatc acaaaaatcg 10560 acgctcaagtcagaggtggc gaaacccgac aggactataa agataccagg cgtttccccc 10620 tggaagctccctcgtgcgct ctcctgttcc gaccctgccg cttaccggat acctgtccgc 10680 ctttctcccttcgggaagcg tggcgctttc tcatagctca cgctgtaggt atctcagttc 10740 ggtgtaggtcgttcgctcca agctgggctg tgtgcacgaa ccccccgttc agcccgaccg 10800 ctgcgccttatccggtaact atcgtcttga gtccaacccg gtaagacacg acttatcgcc 10860 actggcagcagccactggta acaggattag cagagcgagg tatgtaggcg gtgctacaga 10920 gttcttgaagtggtggccta actacggcta cactagaagg acagtatttg gtatctgcgc 10980 tctgctgaagccagttacct tcggaaaaag agttggtagc tcttgatccg gcaaacaaac 11040 caccgctggtagcggtggtt tttttgtttg caagcagcag attacgcgca gaaaaaaagg 11100 atctcaagaagatcctttga tcttttctac ggggtctgac gctcagtgga acgaaaactc 11160 9 14262 DNAArtificial Sequence Artificial Sequence containing human UCOE elementsand vector sequence 9 ggtggcactt ttcggggaaa tgtgcgcgga acccctatttgtttattttt ctaaatacat 60 tcaaatatgt atccgctcat gagacaataa ccctgataaatgcttcaata atattgaaaa 120 aggaagagta tgagtattca acatttccgt gtcgcccttattcccttttt tgcggcattt 180 tgccttcctg tttttgctca cccagaaacg ctggtgaaagtaaaagatgc tgaagatcag 240 ttgggtgcac gagtgggtta catcgaactg gatctcaacagcggtaagat ccttgagagt 300 tttcgccccg aagaacgttt tccaatgatg agcacttttaaagttctgct atgtggcgcg 360 gtattatccc gtattgacgc cgggcaagag caactcggtcgccgcataca ctattctcag 420 aatgacttgg ttgagtactc accagtcaca gaaaagcatcttacggatgg catgacagta 480 agagaattat gcagtgctgc cataaccatg agtgataacactgcggccaa cttacttctg 540 acaacgatcg gaggaccgaa ggagctaacc gcttttttgcacaacatggg ggatcatgta 600 actcgccttg atcgttggga accggagctg aatgaagccataccaaacga cgagcgtgac 660 accacgatgc ctgtagcaat ggcaacaacg ttgcgcaaactattaactgg cgaactactt 720 actctagctt cccggcaaca attaatagac tggatggaggcggataaagt tgcaggacca 780 cttctgcgct cggcccttcc ggctggctgg tttattgctgataaatctgg agccggtgag 840 cgtgggtctc gcggtatcat tgcagcactg gggccagatggtaagccctc ccgtatcgta 900 gttatctaca cgacggggag tcaggcaact atggatgaacgaaatagaca gatcgctgag 960 ataggtgcct cactgattaa gcattggtaa ctgtcagaccaagtttactc atatatactt 1020 tagattgatt taaaacttca tttttaattt aaaaggatctaggtgaagat cctttttgat 1080 aatctcatga ccaaaatccc ttaacgtgag ttttcgttccactgagcgtc agaccccgta 1140 gaaaagatca aaggatcttc ttgagatcct ttttttctgcgcgtaatctg ctgcttgcaa 1200 acaaaaaaac caccgctacc agcggtggtt tgtttgccggatcaagagct accaactctt 1260 tttccgaagg taactggctt cagcagagcg cagataccaaatactgtcct tctagtgtag 1320 ccgtagttag gccaccactt caagaactct gtagcaccgcctacatacct cgctctgcta 1380 atcctgttac cagtggctgc tgccagtggc gataagtcgtgtcttaccgg gttggactca 1440 agacgatagt taccggataa ggcgcagcgg tcgggctgaacggggggttc gtgcacacag 1500 cccagcttgg agcgaacgac ctacaccgaa ctgagatacctacagcgtga gctatgagaa 1560 agcgccacgc ttcccgaagg gagaaaggcg gacaggtatccggtaagcgg cagggtcgga 1620 acaggagagc gcacgaggga gcttccaggg ggaaacgcctggtatcttta tagtcctgtc 1680 gggtttcgcc acctctgact tgagcgtcga tttttgtgatgctcgtcagg ggggcggagc 1740 ctatggaaaa acgccagcaa cgcggccttt ttacggttcctggccttttg ctggcctttt 1800 gctcacatgt tctttcctgc gttatcccct gattctgtggataaccgtat taccgccttt 1860 gagtgagctg ataccgctcg ccgcagccga acgaccgagcgcagcgagtc agtgagcgag 1920 gaagcggaag agcgcccaat acgcaaaccg cctctccccgcgcgttggcc gattcattaa 1980 tgcagctggc acgacaggtt tcccgactgg aaagcgggcagtgagcgcaa cgcaattaat 2040 gtgagttagc tcactcatta ggcaccccag gctttacactttatgcttcc ggctcgtatg 2100 ttgtgtggaa ttgtgagcgg ataacaattt cacacaggaaacagctatga ccatgattac 2160 gccaagcgcg caattaaccc tcactaaagg gaacaaaagctgggtaccgg gccccccctc 2220 gaggtcgacg gtatcgataa gcttcaatgt ttttagcaccctctgtgtgg aggaaaataa 2280 tgcagattat tctaattagt gtaatatcta accacattaaaatatattac atagtaaact 2340 acactccata attttataaa tttgactccc cagggtaataaactagtctc tagtctgctc 2400 accttcaact gtacaataaa gtcttggttc ttttgaaatagacctcaaat gagacaccta 2460 aaattcaaag tgtctttaca tttaaagaca cctacaggaaagcaggtaaa agagccaggt 2520 taaaaacaaa ttctaaaacc acttagctgc agttaaacatatagtaaaga tgcactaaag 2580 tttcttactc tgtaaatccc ttccacttca ggaaatattccactttccca ttcactacac 2640 gtcgatctag tactttttcc acgacaaatt cttcaggctctgcctcttca acttttttac 2700 tctttccatt ctgttttttt cccatttttt gctaaaataaaacaaaagag aaattaagaa 2760 atattcctct tgaattttga gcacattttc aaggctcaattgcttatatt attatcacat 2820 tcgacataaa tttttacttc tatatcccag ggcagacaccttctggaaag attaaaagtc 2880 aacagacaat aaaataaaag aatgctttat cttgttcatttagttcaaac ttacaaccca 2940 ccaccaaaat aatacaataa aaaaacacta tctggaaacagttatttttt tccagtcttt 3000 ttttttgaga cagggtctca cactcttgtc gcccaggctggagtgcagtg gcgtgatctc 3060 agctcactgc aacctccgcc tccccaggtt caagcagttctcatgcctca gcctccagag 3120 tagctgggat tataggcgga tgccaccatg ccgggctaattttttttgtg tttttattag 3180 aaacagggtt tcaccatgtt gaccaggctg gtctcaaactcctgacctga agtgattcac 3240 cagcctgggc ctcccaaagt gctggcatta caggcgtgagccactgcgcc cggccctgta 3300 gtcttaaaag accaagttta ctaattttca ctcattttaacaacactgca acaaacaact 3360 atgcaggaag tacctaaagg gtgatccaga gaagcaagtagtagtgacag gtcttaggtg 3420 aacctatgac agaccttgta tccaccccca gatggtaaaagccccagccc ccttctcaat 3480 tcaaatatta atgtcaaaag catcaatgat acagagaaaagataaatgca gaatgaaaac 3540 atggttcaaa atcctgatac caactgcagg gtcaactatagagaccacta ggaggttcaa 3600 ttaaaggaca agattatttt tccataatct ctgtagataatatttcctac cacttagaac 3660 aaaactataa agctatcact tcaagagacc aacattacaaatttatttta attccctaag 3720 gtgaaaaaaa tccttccttc ctggtttctc aagagaaagtctatactggt aaccaaattc 3780 actttaaaca ggcattttct ttggtatgac actatttaagagaagcagga aaccaacgtg 3840 aaccagctct ttccaatggc tcaagatttc ctatgagaggactaaaaatg gggaaaattt 3900 ttatgagagg attaaaaatg ggggaaaaaa aaccctgaaatggttaatca gaagatccta 3960 tgggctgaga aggaatccat cttaacattt catcttaaagcaaatgctat tgccgggggc 4020 agtggctcat gcctgtaatc ccagcacttt gggaggccgaggtgggcaga tcatctgagg 4080 tcaggagttt gagaccagcc tgaccaacat ggagaaaccccgtttctact aaaaatacaa 4140 aattagccag gcatagtggt gcatgcctgt aatcccagctacttgggagg ctgaggcagg 4200 agaactgctt gaacccagga ggcttaagtt gcggtgagccaagatcacgc cattgcactc 4260 tagcctggac aacaagagaa aaactctgtc tcaaaaaaacacaaaaacaa aaaacccaaa 4320 tactatttaa aaaagataaa ccttaattgc tcaatcattaaagccatccc acaagtaaag 4380 cagcaagcag aaaaaagtta agaacacctc aaggctacagaaggacattt caagctatgc 4440 aggcatatga agtgtgcaga cagatatgta agaaaggcctcaagactgca aaagggcatt 4500 tcaagctatg caagcatata ggtaacacat acacacacacaaaataaaat cccctgaaat 4560 acaaaaacat gcagcaaaca cctgacgttt ttggataccatttctaagtc aggtgttatg 4620 attctcatta gtcaagatac ttgagtactg ggcccaaacagctttctgcc actgtacagt 4680 acaagaaggt aggaataatg gtgggaggag caaagacaaactgtaataga cagaagtgta 4740 tcagatacct atactacatg aaaaacaaaa cagctactgccacaaaggga gaaggctaac 4800 aaaataaagt caacaataaa tacagaaaat gaaaaggatacacactaagg tttacaaaaa 4860 aaaaaaggca gacaaaatgc catacagtat tcattcactactatggcatt cataagctag 4920 tttcaaatgc tcactatttt cttttatagt atatatttgccttaacccag cacttttttc 4980 caaaagtgga tgagtcaaaa taaatttccc attatttaagtgaaattaac agcacacata 5040 tctcacaaca ctaatgaatt tttaaaatgg aaagttaagaacttttaaag tggccaacct 5100 gtgatccttc acaaaataaa ctaaatacaa taacagaccccaaaggctat caattgcgtg 5160 caaaaacaac ttctgttttc cagggtaaac agaatctaatgcagaatcta atgcagggta 5220 aacagactta atgcagaatc taatgatggc acaaattaaaaatcactaac gtgccctttt 5280 tagtgtgaaa cccagagaga gcacatacaa gccaaaaacaaatgctttat tttacctagg 5340 agacattaac attcaccttt acgtgtttaa gattaatgcaatgttaaata ttgtgaaaac 5400 tgtaactttg aatttcatga tttttatgtg aatattccagggtttaaaaa aacttgtaac 5460 atgacatggc tgaataagat aaaaaaaaaa tctagccttttctcccttct ggctcatatt 5520 tgcgatttcg atcattttgt ttaaaaaaca aaacactgcaatgaattaaa cttaatattc 5580 ttctatgttt tagagtaagt taaaacaaga taaagtgaccaaagtaattt gaaagattca 5640 atgacttttg ctccaaccta ggtgcacaag gtaccttgttctttaaattg ggctttaatg 5700 aaaatacttc tccagaattc tggggattta agaaaaattatgccaaccaa caagggcttt 5760 accattttat gtaacatttt tcaacgctgc aaaaatgtgtgtatttctat ttgaagataa 5820 aaatcctcag caaaatccac attgcactgt ccttcaaagattagccttct ttgaactagt 5880 taagacacta ttaagccaag ccagtatctc cctgtaatgaattcgttttt ctcttaattt 5940 tcccctgtaa tttacactgg gagagctggg aaatatgtggatgtaaattt ctcagccaca 6000 gagatgcaaa gttatactgt ggggaaaaaa aacttgagttaaatccttac atattttagg 6060 ttttcattaa cttaccaatg tagttttgtt ggaggccattttttttattg cagacttgaa 6120 gagctattac tagaaaaatg catgacagtt aaggtaagtttgcatgacac aaaaaaggta 6180 actaaataca aattctgttt ggattccaac ccccaagtagagagcgcaca ctttcaaacg 6240 tgaatacaaa tccagagtag atctgcgctc ctacctacattgcttatgat gtacttaagt 6300 acgtgtccta accatgtgag tctagaaaga ctttactggggatcctggta cctaaaacag 6360 cttcacatgg cttaaaatag gggaccaatg tcttttccaatctaagtccc atttataata 6420 aagtccatgt tccattttta aaggacaatc ctttcggtttaaaaccaggc acgattaccc 6480 aaacaactca caacggtaaa gcactgtgaa tcttctctgttctgcaatcc caacttggtt 6540 tctgctcaga aaccctccct ctttccaatc ggtaattaaataacaaaagg aaaaaactta 6600 agatgcttca accccgtttc gtgacacttt gaaaaaagaatcacctcttg caaacacccg 6660 ctcccgaccc ccgccgctga agcccggcgt ccagaggcctaagcgcgggt gcccgccccc 6720 acccgggagc gcgggcctcg tggtcagcgc atccgcggggagaaacaaag gccgcggcac 6780 gggggctcaa gggcactgcg ccacaccgca cgcgcctacccccgcgcggc cacgttaact 6840 ggcggtcgcc gcagcctcgg gacagccggc cgcgcgccgccaggctcgcg gacgcgggac 6900 cacgcgccgc cctccgggag gcccaagtct cgacccagccccgcgtggcg ctgggggagg 6960 gggcgcctcc gccggaacgc gggtggggga ggggagggggaaatgcgctt tgtctcgaaa 7020 tggggcaacc gtcgccacag ctccctaccc cctcgagggcagagcagtcc ccccactaac 7080 taccgggctg gccgcgcgcc aggccagccg cgaggccaccgcccgaccct ccactccttc 7140 ccgcagctcc cggcgcgggg tccggcgaga aggggaggggaggggagcgg agaaccgggc 7200 ccccgggacg cgtgtggcat ctgaagcacc accagcgagcgagagctaga gagaaggaaa 7260 gccaccgact tcaccgcctc cgagctgctc cgggtcgcgggtctgcagcg tctccggccc 7320 tccgcgccta cagctcaagc cacatccgaa gggggagggagccgggagct gcgcgcgggg 7380 ccgccggggg gaggggtggc accgcccacg ccgggcggccacgaagggcg gggcagcggg 7440 cgcgcgcgcg gcggggggag gggccggcgc cgcgcccgctgggaattggg gccctagggg 7500 gagggcggag gcgccgacga ccgcggcact taccgttcgcggcgtggcgc ccggtggtcc 7560 ccaaggggag ggaaggggga ggcggggcga ggacagtgaccggagtctcc tcagcggtgg 7620 cttttctgct tggcagcctc agcggctggc gccaaaaccggactccgccc acttcctcgc 7680 ccgccggtgc gagggtgtgg aatcctccag acgctgggggagggggagtt gggagcttaa 7740 aaactagtac ccctttggga ccactttcag cagcgaactctcctgtacac caggggtcag 7800 ttccacagac gcgggccagg ggtgggtcat tgcggcgtgaacaataattt gactagaagt 7860 tgattcgggt gtttccggaa ggggccgagt caatccgccgagttggggca cggaaaacaa 7920 aaagggaagg ctactaagat ttttctggcg ggggttatcattggcgtaac tgcagggacc 7980 acctcccggg ttgagggggc tggatctcca ggctgcggattaagcccctc ccgtcggcgt 8040 taatttcaaa ctgcgcgacg tttctcacct gccttcgccaaggcaggggc cgggacccta 8100 ttccaagagg tagtaactag caggactcta gccttccgcaattcattgag cgcatttacg 8160 gaagtaacgt cgggtactgt ctctggccgc aagggtgggaggagtacgca tttggcgtaa 8220 ggtggggcgt agagccttcc cgccattggc ggcggatagggcgtttacgc gacggcctga 8280 cgtagcggaa gacgcgttag tgggggggaa ggttctagaaaagcggcggc agcggctcta 8340 gcggcagtag cagcagcgcc gggtcccgtg cggaggtgctcctcgcagag ttgtttctcg 8400 agcagcggca gttctcacta cagcgccagg acgagtccggttcgtgttcg tccgcggaga 8460 tctctctcat ctcgctcggc tgcgggaaat cgggctgaagcgactgagtc cgcgatggag 8520 gtaacgggtt tgaaatcaat gagttattga aaagggcatggcgaggccgt tggcgcctca 8580 gtggaagtcg gccagccgcc tccgtgggag agaggcaggaaatcggacca attcagtagc 8640 agtggggctt aaggtttatg aacggggtct tgagcggaggcctgagcgta caaacagctt 8700 ccccaccctc agcctcccgg cgccatttcc cttcactgggggtgggggat ggggagcttt 8760 cacatggcgg acgctgcccc gctggggtga aagtggggcgcggaggcggg aattcttatt 8820 ccctttctaa agcacgctgc ttcgggggcc acggcgtctcctcggcgagc gtttcggcgg 8880 gcagcaggtc ctcgtgagcg aggctgcgga gcttcccctccccctctctc ccgggaaccg 8940 atttggcggc cgccattttc atggctcgcc ttcctctcagcgttttcctt ataactcttt 9000 tattttctta gtgtgctttc tctatcaaga agtagaagtggttaactatt ttttttttct 9060 tctcgggctg ttttcatatc gtttcgaggt ggatttggagtgttttgtga gcttggatct 9120 ttagagtcct gcgcacctca ttaaaggcgc tcagccttcccctcgatgaa atggcgccat 9180 tgcgttcgga agccacaccg aagagcgggg agggggggtgctccgggttt gcgggcccgg 9240 tttcagagaa gatatcacca cccagggcgt cgggccgggttcaatgcgag ccgtaggaca 9300 aagaaaccat tttatgtttt tcctgtcttt tttttcctttgagtaacggt tttatctggg 9360 tctgcagtca gtaaaacgac agatgaaccg cggcaaaataaacataaatt ggaagccatc 9420 ggccacgagg ggcagggacg aaggtggttt tctgggcgggggagggatat tcgcgtcaga 9480 atcctttact gttcttaagg attccgttta agttgtagagctgactcatt ttaagtaatg 9540 ttgttactga gaagtttaac ccttacggga cagatccatggacctttata gatgattacg 9600 aggaaagtga aataacgatt ttgtccttag ttatacttcgattaaaacat ggcttcagag 9660 gctccttcct gtaatgcgta tggattgatg tgcaaaactgttttgggcct gggccgctct 9720 gtatttgaac tttgttactt ttctcatttt gtttgcaatcttggttgaac attacattga 9780 taagcataag gtctcaagcg aagggggtct acctggttatttttctttga ccctaagcac 9840 gtttataaaa taacattgtt taaaatcgat agtggacatcgggtaagttt ggataaattg 9900 tgaggtaagt aatgagtttt tgctttttgt tagtgatttgtaaaacttgt tataaatgta 9960 cattatccgt aatttcagtt tagagataac ctatgtgctgacgacaatta agaataaaaa 10020 ctagctgaaa aaatgaaaat aactatcgtg acaagtaaccatttcaaaag actgctttgt 10080 gtctcatagg agctagtttg atcatttcag ttaattttttctttaatttt tacgagtcat 10140 gaaaactaca ggaaaaaaaa tctgaactgg gttttaccactactttttag gagttgggag 10200 catgcgaatg gagggagagc tccgtagaac tgggatgagagcagcaatta atgctgcttg 10260 ctaggaacaa aaaataattg attgaaaatt acgtgtgactttttagtttg cattatgcgt 10320 ttgtagcagt tggtcctgga tatcactttc tctcgtttgaggttttttaa cctagttaac 10380 ttttaagaca ggtttcctta acattcataa gtgcccagaatacagctgtg tagtacagca 10440 tataaagatt tcagctctga ggtttttcct attgacttggaaaattgttt tgtgcctgtc 10500 gcttgccaca tggccaatca agtaagcttg attaatagtaatcaattacg gggtcattag 10560 ttcatagccc atatatggag ttccgcgtta cataacttacggtaaatggc ccgcctggct 10620 gaccgcccaa cgacccccgc ccattgacgt caataatgacgtatgttccc atagtaacgc 10680 caatagggac tttccattga cgtcaatggg tggagtatttacggtaaact gcccacttgg 10740 cagtacatca agtgtatcat atgccaagta cgccccctattgacgtcaat gacggtaaat 10800 ggcccgcctg gcattatgcc cagtacatga ccttatgggactttcctact tggcagtaca 10860 tctacgtatt agtcatcgct attaccatgg tgatgcggttttggcagtac atcaatgggc 10920 gtggatagcg gtttgactca cggggatttc caagtctccaccccattgac gtcaatggga 10980 gtttgttttg gcaccaaaat caacgggact ttccaaaatgtcgtaacaac tccgccccat 11040 tgacgcaaat gggcggtagg cgtgtacggt gggaggtctatataagcaga gctggtttag 11100 tgaaccgtca gatccgctag ccggtcgcca ccatggtgagcaagggcgag gagctgttca 11160 ccggggtggt gcccatcctg gtcgagctgg acggcgacgtaaacggccac aagttcagcg 11220 tgtccggcga gggcgagggc gatgccacct acggcaagctgaccctgaag ttcatctgca 11280 ccaccggcaa gctgcccgtg ccctggccca ccctcgtgaccaccctgacc tacggcgtgc 11340 agtgcttcag ccgctacccc gaccacatga agcagcacgacttcttcaag tccgccatgc 11400 ccgaaggcta cgtccaggag cgcaccatct tcttcaaggacgacggcaac tacaagaccc 11460 gcgccgaggt gaagttcgag ggcgacaccc tggtgaaccgcatcgagctg aagggcatcg 11520 acttcaagga ggacggcaac atcctggggc acaagctggagtacaactac aacagccaca 11580 acgtctatat catggccgac aagcagaaga acggcatcaaggtgaacttc aagatccgcc 11640 acaacatcga ggacggcagc gtgcagctcg ccgaccactaccagcagaac acccccatcg 11700 gcgacggccc cgtgctgctg cccgacaacc actacctgagcacccagtcc gccctgagca 11760 aagaccccaa cgagaagcgc gatcacatgg tcctgctggagttcgtgacc gccgccggga 11820 tcactctcgg catggacgag ctgtacaagt aaagcggccgcgactctaga tcataatcag 11880 ccataccaca tttgtagagg ttttacttgc tttaaaaaacctcccacacc tccccctgaa 11940 cctgaaacat aaaatgaatg caattgttgt tgttaacttgtttattgcag cttataatgg 12000 ttacaaataa agcaatagca tcacaaattt cacaaataaagcattttttt cactgcattc 12060 tagttgtggt ttgtccaaac tcatcaatgt atcttaaatcgaattctacc gggtagggga 12120 ggcgcttttc ccaaggcagt ctggagcatg cgctttagcagccccgctgg gcacttggcg 12180 ctacacaagt ggcctctggc ctcgcacaca ttccacatccaccggtaggc gccaaccggc 12240 tccgttcttt ggtggcccct tcgcgccacc ttctactcctcccctagtca ggaagttccc 12300 ccccgccccg cagctcgcgt cgtgcaggac gtgacaaatggaagtagcac gtctcactag 12360 tctcgtgcag atggacagca ccgctgagca atggaagcgggtaggccttt ggggcagcgg 12420 ccaatagcag ctttgctcct tcgctttctg ggctcagaggctgggaaggg gtgggtccgg 12480 gggcgggctc aggggcgggc tcaggggcgg ggcgggcgcccgaaggtcct ccggaggccc 12540 ggcattctgc acgcttcaaa agcgcacgtc tgccgcgctgttctcctctt cctcatctcc 12600 gggcctttcg accagcttac catgaccgag tacaagcccacggtgcgcct cgccacccgc 12660 gacgacgtcc ccagggccgt acgcaccctc gccgccgcgttcgccgacta ccccgccacg 12720 cgccacaccg tcgatccgga ccgccacatc gagcgggtcaccgagctgca agaactcttc 12780 ctcacgcgcg tcgggctcga catcggcaag gtgtgggtcgcggacgacgg cgccgcggtg 12840 gcggtctgga ccacgccgga gagcgtcgaa gcgggggcggtgttcgccga gatcggcccg 12900 cgcatggccg agttgagcgg ttcccggctg gccgcgcagaacagatggaa ggcctcctgg 12960 cgccgcaccg gcccaaggag cccgcgtggt tcctggccaccgtcgcgtct cgcccgacca 13020 ccagggcaag ggtctgggca gcgccgtcgt gctccccggagtggaggcgg ccgagcgcgc 13080 cggggtgccc gccttcctgg agacctccgc gccccgcaacctccccttct acgagcggct 13140 cggcttcacc gtcaccgccg acgtcgaggt gcccgaaggaccgcgcacct ggtgcatgac 13200 ccgcaagccc ggtgcctgac gcccgcccca cgacccgcagcgcccgaccg aaaggagcgc 13260 acgaccccat gcatcgtaga gctcgctgat cagcctcgactgtgccttct agttgccagc 13320 catctgttgt ttgcccctcc cccgtgcctt ccttgaccctggaaggtgcc actcccactg 13380 tcctttccta ataaaatgag gaaattgcat cgcattgtctgagtaggtgt cattctattc 13440 tggggggtgg ggtggggcag gacagcaagg ggggggattgggragacaat agcaggcatg 13500 ctgggggggc ggtgggggct atggcttctg aggcggaaagaaccagctgg ggctcgagat 13560 ccactagttc tagcctcgag gctagagcgg ccgccaccgcggtggagctc caattcgccc 13620 tatagtgagt cgtattacgc gcgctcactg gccgtcgttttacaacgtcg tgactgggaa 13680 aaccctggcg ttacccaact taatcgcctt gcagcacatccccctttcgc cagctggcgt 13740 aatagcgaag aggcccgcac cgatcgccct tcccaacagttgcgcagcct gaatggcgaa 13800 tggaaattgt aagcgttaat attttgttaa aattcgcgttaaatttttgt taaatcagct 13860 cattttttaa ccaataggcc gaaatcggca aaatcccttataaatcaaaa gaatagaccg 13920 agatagggtt gagtgttgtt ccagtttgga acaagagtccactattaaag aacgtggact 13980 ccaacgtcaa agggcgaaaa accgtctatc agggcgatggcccactacgt gaaccatcac 14040 cctaatcaag ttttttgggg tcgaggtgcc gtaaagcactaaatcggaac cctaaaggga 14100 gcccccgatt tagagcttga cggggaaagc cggcgaacgtggcgagaaag gaagggaaga 14160 aagcgaaagg agcgggcgct agggcgctgg caagtgtagcggtcacgctg cgcgtaacca 14220 ccacacccgc cgcgcttaat gcgccgctac agggcgcgtcag 14262 10 13 DNA Artificial Sequence PCR primer 10 aacaattggc ggc 1311 13 DNA Artificial Sequence PCR primer 11 gccaattgtt gcc 13 12 31 DNAArtificial Sequence PCR primer 12 acgcgtcgac ggaaggagac aataccggaa g 3113 28 DNA Artificial Sequence PCR primer 13 ccgctcgagt tggggtggggaaaaggaa 28 14 30 DNA Artificial Sequence PCR primer 14 cgggatccgcctgagaaagg aagtgagctg 30 15 29 DNA Artificial Sequence PCR primer 15gaagatctgg aggaatgagc tggccctta 29 16 8 DNA Artificial Sequence PCRprimer 16 gactagtc 8 17 35 DNA Artificial Sequence PCR primer 17ctcgagttat taatagtaat caattacggg gtcat 35 18 33 DNA Artificial SequencePCR primer 18 gtcgacgatc tgacggttca ctaaaccagc tct 33 19 30 DNAArtificial Sequence PCR primer 19 ccaatgcata ggttgggctt cgggaatcgt 30 2031 DNA Artificial Sequence PCR primer 20 gctctagatc tcgacggtatacagacatga t 31 21 36 DNA Artificial Sequence PCR primer 21 cccaagcttattaatagtaa tcaattacgg ggtcat 36 22 36 DNA Artificial Sequence PCR primer22 caaggatccg atctgacggt tcactaaacc agctct 36 23 20 DNA ArtificialSequence PCR primer 23 tcgagtcgtt taaactctag 20 24 20 DNA ArtificialSequence PCR primer 24 tcgactagag tttaaacgac 20 25 33 DNA ArtificialSequence PCR primer 25 gaattcgagc tcgcccaact ccgcccgttt tat 33 26 39 DNAArtificial Sequence PCR primer 26 atttgtcgac tctagacccg ggctgcagcgaggagctct 39 27 12588 DNA Artificial Sequence Artificial Sequencecontaining human UCOE elements and vector sequence 27 acgttgtaaaacgacggcca gtgaattgta atacgactca ctatagggcg aattgggtac 60 cgggccccccctcgaggtcg agttggggtg gggaaaagga agaaacgcgg gcgtattggc 120 cccaatggggtctcggtggg gtatcgacag agtgccagcc ctgggaccga accccgcgtt 180 tatgaacaaacgacccaaca cccgtgcgtt ttattctgtc tttttattgc cgtcatagcg 240 cgggttccttccggtattgt ctccttccgt cgacgatctg acggttcact aaaccagctc 300 tgcttatatagacctcccac cgtacacgcc taccgcccat ttgcgtcaat ggggcggagt 360 tgttacgacattttggaaag tcccgttgat tttggtgcca aaacaaactc ccattgacgt 420 caatggggtggagacttgga aatccccgtg agtcaaaccg ctatccacgc ccattgatgt 480 actgccaaaaccgcatcacc atggtaatag cgatgactaa tacgtagatg tactgccaag 540 taggaaagtcccataaggtc atgtactggg cataatgcca ggcgggccat ttaccgtcat 600 tgacgtcaatagggggcgta cttggcatat gatacacttg atgtactgcc aagtgggcag 660 tttaccgtaaatactccacc cattgacgtc aatggaaagt ccctattggc gttactatgg 720 gaacatacgtcattattgac gtcaatgggc gggggtcgtt gggcggtcag ccaggcgggc 780 catttaccgtaagttatgta acgcggaact ccatatatgg gctatgaact aatgaccccg 840 taattgattactattaataa ctcgacggta tcatggtggc gaccggcatg gtgagctgcg 900 agaatagccgggcgcgctgt gagccgaagt cgcccccgcc ctggccactt ccggcgcgcc 960 gagtccttaggccgccaggg ggcgccggcg cgcgcccaga ttggggacaa aggaagccgg 1020 gccggccgcgttattaccat aaaaggcaaa cactggtcgg aggcgtcccc gcggcgcgcg 1080 gcaggaagccaggccccaac cccctcccaa ccgggcgcca gccccgcctc cgcccggttc 1140 aaacagcgaccgggtcgcgc gcgcgcacgc agcggccaca ccctcgggcg ccagcggctc 1200 gggcaggaagtggcgcaagc gcccgggccc cagaacgcac gcgcgattag cgccattgag 1260 tcccagcgcgcacgcgcaat tagcgccaat tcccagcgcg cacgcagtta gcgcccaaag 1320 gaccagcgcgcacgcgcatg gcgccccagc ccccaccggg cctgacgggg gctacgccgc 1380 gcccaccgtgcgatccccat tggcaagagc ccggctcaga caaagacccc gccggttgcc 1440 cccgccccgagagcggcacc cccggagcgc gcccgcccga gcgcggcctc gcgcctgcga 1500 actggcgtggggtgtccccc atctccggag gcccaggggc ttctcccgcg ccccccacgg 1560 cggtccggttccgccccatg cgccccccgc tgcggcccag acggcggctc tgcacgggcg 1620 aagggccgcggccgcatgcc ccggtcggct ggccgggctt acctggcggc gggtgtggac 1680 gggcggcggatcggcaaagg cgaggctctg tgctcgcggg cggacgcggt ctcggcggtg 1740 gtggcgcgtcgcgccgctgg gttttatagg gcgccgccgc ggccgctcga gccataaaag 1800 gcaactttcggaacggcgca cgctgattgg ccccgcgccg ctcactcacc ggcttcgccg 1860 cacagtgcagcattttttta ccccctctcc cctccttttg cgaaaaaaaa aaagagcgag 1920 agcgagattgaggaagagga ggagggagag ttttggcgtt ggccgccttg gggtgctggg 1980 cccgggggctgggggcgcgc gccgtggccc ccgcgcccca cgctgggcag tgcccggttc 2040 ggccccgcatggccaggcct gcccccggcc tgcccgtctc tcgggccccc cacccaccgc 2100 gggacatcctaggtgtggac atctcttggg cactgagcgc ccaggtgggg tgggccaggg 2160 tctgcacgggtgccagggcc ctgggttctg tacgctcctg cagaaggagc tcttggaggg 2220 catggagtggccaggcagtc actccccctt gccgacttca gagcaactgc cctgaaagca 2280 gggcctgaggacctctggct gtggggctca gctagctaaa tgtgctgggt gggtcactag 2340 ggagagacctgggcttgaga ggtagagtgt ggtgttgggg gagtcaggtg gcttgcggcc 2400 attagagtcgcaggaccaca ctccccagga cagggcaggg gccagcggtc cagtggctgg 2460 aggtggcccgtgatgaaggc tacaaaccta cccagccgca gccctgggaa ggaagtgggc 2520 tctacagggcagggcacctt ttaccctgga gctgcctgct tttgagggta acagtcacgc 2580 ccagccaagaccaggcctgg ggcgttagtg ggtgacctag gcactgcggg gcgggggggc 2640 tgggtctacacagcctgggt ctgggcccac cgtccgttgt atgtctgcta tgcgcagcca 2700 cagctgaactgccctcccag accatctgga ggccgctggg ggactctggg gaccaagact 2760 ccatgtgccacagaggattg ggggcggggc ggtgctagga actcaaagcc agcctgggaa 2820 gaccctgtccttgtcaccct ttcttgcctt gggtctgtcc actgagtagc acacaagacc 2880 gggtgggcagggtccgttct gctccgggaa tcacagactg tgtgtaccca ggtggtgggc 2940 atgcagcgatcagtggcgtg ggaccacaga gggggcccgc ggtacctaaa acagcttcac 3000 atggcttaaaataggggacc aatgtctttt ccaatctaag tcccatttat aataaagtcc 3060 atgttccatttttaaaggac aatcctttcg gtttaaaacc aggcacgatt acccaaacaa 3120 ctcacaacggtaaagcactg tgaatcttct ctgttctgca atcccaactt ggtttctgct 3180 cagaaaccctccctctttcc aatcggtaat taaataacaa aaggaaaaaa cttaagatgc 3240 ttcaaccccgtttcgtgaca ctttgaaaaa agaatcacct cttgcaaaca cccgctcccg 3300 acccccgccgctgaagcccg gcgtccagag gcctaagcgc gggtgcccgc ccccacccgg 3360 gagcgcgggcctcgtggtca gcgcatccgc ggggagaaac aaaggccgcg gcacgggggc 3420 tcaagggcactgcgccacac cgcacgcgcc tacccccgcg cggccacgtt aactggcggt 3480 cgccgcagcctcgggacagc cggccgcgcg ccgccaggct cgcggacgcg ggaccacgcg 3540 ccgccctccgggaggcccaa gtctcgaccc agccccgcgt ggcgctgggg gagggggcgc 3600 ctccgccggaacgcgggtgg gggaggggag ggggaaatgc gctttgtctc gaaatggggc 3660 aaccgtcgccacagctccct accccctcga gggcagagca gtccccccac taactaccgg 3720 gctggccgcgcgccaggcca gccgcgaggc caccgcccga ccctccactc cttcccgcag 3780 ctcccggcgcggggtccggc gagaagggga ggggagggga gcggagaacc gggcccccgg 3840 gacgcgtgtggcatctgaag caccaccagc gagcgagagc tagagagaag gaaagccacc 3900 gacttcaccgcctccgagct gctccgggtc gcgggtctgc agcgtctccg gccctccgcg 3960 cctacagctcaagccacatc cgaaggggga gggagccggg agctgcgcgc ggggccgccg 4020 gggggaggggtggcaccgcc cacgccgggc ggccacgaag ggcggggcag cgggcgcgcg 4080 cgcggcggggggaggggccg gcgccgcgcc cgctgggaat tggggcccta gggggagggc 4140 ggaggcgccgacgaccgcgg cacttaccgt tcgcggcgtg gcgcccggtg gtccccaagg 4200 ggagggaagggggaggcggg gcgaggacag tgaccggagt ctcctcagcg gtggcttttc 4260 tgcttggcagcctcagcggc tggcgccaaa accggactcc gcccacttcc tcgcccgccg 4320 gtgcgagggtgtggaatcct ccagacgctg ggggaggggg agttgggagc ttaaaaacta 4380 gtacccctttgggaccactt tcagcagcga actctcctgt acaccagggg tcagttccac 4440 agacgcgggccaggggtggg tcattgcggc gtgaacaata atttgactag aagttgattc 4500 gggtgtttccggaaggggcc gagtcaatcc gccgagttgg ggcacggaaa acaaaaaggg 4560 aaggctactaagatttttct ggcgggggtt atcattggcg taactgcagg gaccacctcc 4620 cgggttgagggggctggatc tccaggctgc ggattaagcc cctcccgtcg gcgttaattt 4680 caaactgcgcgacgtttctc acctgccttc gccaaggcag gggccgggac cctattccaa 4740 gaggtagtaactagcaggac tctagccttc cgcaattcat tgagcgcatt tacggaagta 4800 acgtcgggtactgtctctgg ccgcaagggt gggaggagta cgcatttggc gtaaggtggg 4860 gcgtagagccttcccgccat tggcggcgga tagggcgttt acgcgacggc ctgacgtagc 4920 ggaagacgcgttagtggggg ggaaggttct agaaaagcgg cggcagcggc tctagcggca 4980 gtagcagcagcgccgggtcc cgtgcggagg tgctcctcgc agagttgttt ctcgagcagc 5040 ggcagttctcactacagcgc caggacgagt ccggttcgtg ttcgtccgcg gagatctctc 5100 tcatctcgctcggctgcggg aaatcgggct gaagcgactg agtccgcgat ggaggtaacg 5160 ggtttgaaatcaatgagtta ttgaaaaggg catggcgagg ccgttggcgc ctcagtggaa 5220 gtcggccagccgcctccgtg ggagagaggc aggaaatcgg accaattcag tagcagtggg 5280 gcttaaggtttatgaacggg gtcttgagcg gaggcctgag cgtacaaaca gcttccccac 5340 cctcagcctcccggcgccat ttcccttcac tgggggtggg ggatggggag ctttcacatg 5400 gcggacgctgccccgctggg gtgaaagtgg ggcgcggagg cgggaattct tattcccttt 5460 ctaaagcacgctgcttcggg ggccacggcg tctcctcggc gagcgtttcg gcgggcagca 5520 ggtcctcgtgagcgaggctg cggagcttcc cctccccctc tctcccggga accgatttgg 5580 cggccgccattttcatggct cgccttcctc tcagcgtttt ccttataact cttttatttt 5640 cttagtgtgctttctctatc aagaagtaga agtggttaac tatttttttt ttcttctcgg 5700 gctgttttcatatcgtttcg aggtggattt ggagtgtttt gtgagcttgg atctttagag 5760 tcctgcgcacctcattaaag gcgctcagcc ttcccctcga tgaaatggcg ccattgcgtt 5820 cggaagccacaccgaagagc ggggaggggg ggtgctccgg gtttgcgggc ccggtttcag 5880 agaagatatcaccacccagg gcgtcgggcc gggttcaatg cgagccgtag gacaaagaaa 5940 ccattttatgtttttcctgt cttttttttc ctttgagtaa cggttttatc tgggtctgca 6000 gtcagtaaaacgacagatga accgcggcaa aataaacata aattggaagc catcggccac 6060 gaggggcagggacgaaggtg gttttctggg cgggggaggg atattcgcgt cagaatcctt 6120 tactgttcttaaggattccg tttaagttgt agagctgact cattttaagt aatgttgtta 6180 ctgagaagtttaacccttac gggacagatc catggacctt tatagatgat tacgaggaaa 6240 gtgaaataacgattttgtcc ttagttatac ttcgattaaa acatggcttc agaggctcct 6300 tcctgtaatgcgtatggatt gatgtgcaaa actgttttgg gcctgggccg ctctgtattt 6360 gaactttgttacttttctca ttttgtttgc aatcttggtt gaacattaca ttgataagca 6420 taaggtctcaagcgaagggg gtctacctgg ttatttttct ttgaccctaa gcacgtttat 6480 aaaataacattgtttaaaat cgatagtgga catcgggtaa gtttggataa attgtgaggt 6540 aagtaatgagtttttgcttt ttgttagtga tttgtaaaac ttgttataaa tgtacattat 6600 ccgtaatttcagtttagaga taacctatgt gctgacgaca attaagaata aaaactagct 6660 gaaaaaatgaaaataactat cgtgacaagt aaccatttca aaagactgct ttgtgtctca 6720 taggagctagtttgatcatt tcagttaatt ttttctttaa tttttacgag tcatgaaaac 6780 tacaggaaaaaaaatctgaa ctgggtttta ccactacttt ttaggagttg ggagcatgcg 6840 aatggagggagagctccgta gaactgggat gagagcagca attaatgctg cttgctagga 6900 acaaaaaataattgattgaa aattacgtgt gactttttag tttgcattat gcgtttgtag 6960 cagttggtcctggatatcac tttctctcgt ttgaggtttt ttaacctagt taacttttaa 7020 gacaggtttccttaacattc ataagtgccc agaatacagc tgtgtagtac agcatataaa 7080 gatttcagctctgaggtttt tcctattgac ttggaaaatt gttttgtgcc tgtcgcttgc 7140 cacatggccaatcaagtaag cttcgaattc gagctcgccc aactccgccc gttttatgac 7200 tagaaccaatagtttttaat gccaaatgca ctgaaatccc ctaatttgca aagccaaacg 7260 ccccctatgtgagtaatacg gggacttttt acccaatttc ccaagcggaa agccccctaa 7320 tacactcatatggcatatga atcagcacgg tcatgcactc taatggcggc ccatagggac 7380 tttccacatagggggcgttc accatttccc agcatagggg tggtgactca atggccttta 7440 cccaagtacattgggtcaat gggaggtaag ccaatgggtt tttcccatta ctggcaagca 7500 cactgagtcaaatgggactt tccactgggt tttgcccaag tacattgggt caatgggagg 7560 tgagccaatgggaaaaaccc attgctgcca agtacactga ctcaataggg actttccaat 7620 gggtttttccattgttggca agcatataag gtcaatgtgg gtgagtcaat agggactttc 7680 cattgtattctgcccagtac ataaggtcaa tagggggtga atcaacagga aagtcccatt 7740 ggagccaagtacactgcgtc aatagggact ttccattggg ttttgcccag tacataaggt 7800 caataggggatgagtcaatg ggaaaaaccc attggagcca agtacactga ctcaataggg 7860 actttccattgggttttgcc cagtacataa ggtcaatagg gggtgagtca acaggaaagt 7920 cccattggagccaagtacat tgagtcaata gggactttcc aatgggtttt gcccagtaca 7980 taaggtcaatgggaggtaag ccaatgggtt tttcccatta ctggcacgta tactgagtca 8040 ttagggactttccaatgggt tttgcccagt acataaggtc aataggggtg aatcaacagg 8100 aaagtcccattggagccaag tacactgagt caatagggac tttccattgg gttttgccca 8160 gtacaaaaggtcaatagggg gtgagtcaat gggtttttcc cattattggc acgtacataa 8220 ggtcaataggggtgagtcat tgggtttttc cagccaattt aattaaaacg ccatgtactt 8280 tcccaccattgacgtcaatg ggctattgaa actaatgcaa cgtgaccttt aaacggtact 8340 ttcccatagctgattaatgg gaaagtaccg ttctcgagcc aatacacgtc aatgggaagt 8400 gaaagggcagccaaaacgta acaccgcccc ggttttcccc tggaaattcc atattggcac 8460 gcattctattggctgagctg cgttctacgt gggtataaga ggcgcgacca gcgtcggtac 8520 cgtcgcagtcttcggtctga ccaccgtaga acgcagagct cctcgctgca gcccgggtct 8580 agaggatccgcctgagaaag gaagtgagct gtaaaggctg agctctctct ctgacgtatg 8640 tagcctctggttagcttcgt cactcactgt tcttgactca gcatggcaat ctgatgaaat 8700 cccagctgtaagtctgcaga aattgatgat ctattaaaca ataaagatgt ccactaaaat 8760 ggaagtttttcctgtcatac tttgttaaga agggtgagaa cagagtacct acattttgaa 8820 tggaaggattggagctacgg gggtgggggt ggggtgggat tagataaatg cctgctcttt 8880 actgaaggctctttactatt gctttatgat aatgtttcat agttggatat cataatttaa 8940 acaagcaaaaccaaattaag ggccagctca ttcctccaga tccactagtt ctagagcaaa 9000 ttctaccgggtaggggaggc gcttttccca aggcagtctg gagcatgcgc tttagcagcc 9060 ccgctgggcacttggcgcta cacaagtggc ctctggcctc gcacacattc cacatccacc 9120 ggtaggcgccaaccggctcc gttctttggt ggccccttcg cgccaccttc tactcctccc 9180 ctagtcaggaagttcccccc cgccccgcag ctcgcgtcgt gcaggacgtg acaaatggaa 9240 gtagcacgtctcactagtct cgtgcagatg gacagcaccg ctgagcaatg gaagcgggta 9300 ggcctttggggcagcggcca atagcagctt tgctccttcg ctttctgggc tcagaggctg 9360 ggaaggggtgggtccggggg cgggctcagg ggcgggctca ggggcggggc gggcgcccga 9420 aggtcctccggaggcccggc attctgcacg cttcaaaagc gcacgtctgc cgcgctgttc 9480 tcctcttcctcatctccggg cctttcgacc agcttaccat gaccgagtac aagcccacgg 9540 tgcgcctcgccacccgcgac gacgtcccca gggccgtacg caccctcgcc gccgcgttcg 9600 ccgactaccccgccacgcgc cacaccgtcg atccggaccg ccacatcgag cgggtcaccg 9660 agctgcaagaactcttcctc acgcgcgtcg ggctcgacat cggcaaggtg tgggtcgcgg 9720 acgacggcgccgcggtggcg gtctggacca cgccggagag cgtcgaagcg ggggcggtgt 9780 tcgccgagatcggcccgcgc atggccgagt tgagcggttc ccggctggcc gcgcagaaca 9840 gatggaaggcctcctggcgc cgcaccggcc caaggagccc gcgtggttcc tggccaccgt 9900 cgcgtctcgcccgaccacca gggcaagggt ctgggcagcg ccgtcgtgct ccccggagtg 9960 gaggcggccgagcgcgccgg ggtgcccgcc ttcctggaga cctccgcgcc ccgcaacctc 10020 cccttctacgagcggctcgg cttcaccgtc accgccgacg tcgaggtgcc cgaaggaccg 10080 cgcacctggtgcatgacccg caagcccggt gcctgacgcc cgccccacga cccgcagcgc 10140 ccgaccgaaaggagcgcacg accccatgca taggttgggc ttcggaatcg ttttccggga 10200 cgccggctggatgatcctcc agcgcgggga tctcatgctg gagttcttcg cccaccccaa 10260 cttgtttattgcagcttata atggttacaa ataaagcaat agcatcacaa atttcacaaa 10320 taaagcatttttttcactgc attctagttg tggtttgtcc aaactcatca atgtatctta 10380 tcatgtctgtataccgtcga gatctagagc ggccgccacc gcggtggagc tccagctttt 10440 gttccctttagtgagggtta atttcgagct tggcgtaatc atggtcatag ctgtttcctg 10500 tgtgaaattgttatccgctc acaattccac acaacatacg agccggaagc ataaagtgta 10560 aagcctggggtgcctaatga gtgagctaac tcacattaat tgcgttgcgc tcactgcccg 10620 ctttccagtcgggaaacctg tcgtgccagg gggtacctag gccgggcaac aattggcggc 10680 cggccgcacttttcggggaa atgtgcgcgg aacccctatt tgtttatttt tctaaataca 10740 ttcaaatatgtatccgctca tgagacaata accctgataa atgcttcaat aatattgaaa 10800 aaggaagagtatgagtattc aacatttccg tgtcgccctt attccctttt ttgcggcatt 10860 ttgccttcctgtttttgctc acccagaaac gctggtgaaa gtaaaagatg ctgaagatca 10920 gttgggtgcacgagtgggtt acatcgaact ggatctcaac agcggtaaga tccttgagag 10980 ttttcgccccgaagaacgtt ttccaatgat gagcactttt aaagttctgc tatgtggcgc 11040 ggtattatcccgtattgacg ccgggcaaga gcaactcggt cgccgcatac actattctca 11100 gaatgacttggttgagtact caccagtcac agaaaagcat cttacggatg gcatgacagt 11160 aagagaattatgcagtgctg ccataaccat gagtgataac actgcggcca acttacttct 11220 gacaacgatcggaggaccga aggagctaac cgcttttttg cacaacatgg gggatcatgt 11280 aactcgccttgatcgttggg aaccggagct gaatgaagcc ataccaaacg acgagcgtga 11340 caccacgatgcctgtagcaa tggcaacaac gttgcgcaaa ctattaactg gcgaactact 11400 tactctagcttcccggcaac aattaataga ctggatggag gcggataaag ttgcaggacc 11460 acttctgcgctcggcccttc cggctggctg gtttattgct gataaatctg gagccggtga 11520 gcgtgggtctcgcggtatca ttgcagcact ggggccagat ggtaagccct cccgtatcgt 11580 agttatctacacgacgggga gtcaggcaac tatggatgaa cgaaatagac agatcgctga 11640 gataggtgcctcactgatta agcattggta actgtcagac cctaggccgg gcaacaattg 11700 gcggccggccctgcattaat gaatcggcca acgcgcgggg agaggcggtt tgcgtattgg 11760 gcgctcttccgcttcctcgc tcactgactc gctgcgctcg gtcgttcggc tgcggcgagc 11820 ggtatcagctcactcaaagg cggtaatacg gttatccaca gaatcagggg ataacgcagg 11880 aaagaacatgtgagcaaaag gccagcaaaa ggccaggaac cgtaaaaagg ccgcgttgct 11940 ggcgtttttccataggctcc gcccccctga cgagcatcac aaaaatcgac gctcaagtca 12000 gaggtggcgaaacccgacag gactataaag ataccaggcg tttccccctg gaagctccct 12060 cgtgcgctctcctgttccga ccctgccgct taccggatac ctgtccgcct ttctcccttc 12120 gggaagcgtggcgctttctc atagctcacg ctgtaggtat ctcagttcgg tgtaggtcgt 12180 tcgctccaagctgggctgtg tgcacgaacc ccccgttcag cccgaccgct gcgccttatc 12240 cggtaactatcgtcttgagt ccaacccggt aagacacgac ttatcgccac tggcagcagc 12300 cactggtaacaggattagca gagcgaggta tgtaggcggt gctacagagt tcttgaagtg 12360 gtggcctaactacggctaca ctagaaggac agtatttggt atctgcgctc tgctgaagcc 12420 agttaccttcggaaaaagag ttggtagctc ttgatccggc aaacaaacca ccgctggtag 12480 cggtggtttttttgtttgca agcagcagat tacgcgcaga aaaaaaggat ctcaagaaga 12540 tcctttgatcttttctacgg ggtctgacgc tcagtggaac gaaaactc 12588 28 11998 DNA ArtificialSequence Artificial Sequence containing human UCOE elements and vectorsequence 28 acgttgtaaa acgacggcca gtgaattgta atacgactca ctatagggcgaattgggtac 60 cgggcccccc ctcgaggtcg agttggggtg gggaaaagga agaaacgcgggcgtattggc 120 cccaatgggg tctcggtggg gtatcgacag agtgccagcc ctgggaccgaaccccgcgtt 180 tatgaacaaa cgacccaaca cccgtgcgtt ttattctgtc tttttattgccgtcatagcg 240 cgggttcctt ccggtattgt ctccttccgt cgacggtatc aaggtggcgaccggaatggt 300 gagctgcgag aatagccggg cgcgctgtga gccgaagtcg cccccgccctggccacttcc 360 ggcgcgccga gtccttaggc cgccaggggg cgccggcgcg cgcccagattggggacaaag 420 gaagccgggc cggccgcgtt attaccataa aaggcaaaca ctggtcggaggcgtccccgc 480 ggcgcgcggc aggaagccag gccccaaccc cctcccaacc gggcgccagccccgcctccg 540 cccggttcaa acagcgaccg ggtcgcgcgc gcgcacgcag cggccacaccctcgggcgcc 600 agcggctcgg gcaggaagtg gcgcaagcgc ccgggcccca gaacgcacgcgcgattagcg 660 ccattgagtc ccagcgcgca cgcgcaatta gcgccaattc ccagcgcgcacgcagttagc 720 gcccaaagga ccagcgcgca cgcgcatggc gccccagccc ccaccgggcctgacgggggc 780 tacgccgcgc ccaccgtgcg atccccattg gcaagagccc ggctcagacaaagaccccgc 840 cggttgcccc cgccccgaga gcggcacccc cggagcgcgc ccgcccgagcgcggcctcgc 900 gcctgcgaac tggcgtgggg tgtcccccat ctccggaggc ccaggggcttctcccgcgcc 960 ccccacggcg gtccggttcc gccccatgcg ccccccgctg cggcccagacggcggctctg 1020 cacgggcgaa gggccgcggc cgcatgcccc ggtcggctgg ccgggcttacctggcggcgg 1080 gtgtggacgg gcggcggatc ggcaaaggcg aggctctgtg ctcgcgggcggacgcggtct 1140 cggcggtggt ggcgcgtcgc gccgctgggt tttatagggc gccgccgcggccgctcgagc 1200 cataaaaggc aactttcgga acggcgcacg ctgattggcc ccgcgccgctcactcaccgg 1260 cttcgccgca cagtgcagca tttttttacc ccctctcccc tccttttgcgaaaaaaaaaa 1320 agagcgagag cgagattgag gaagaggagg agggagagtt ttggcgttggccgccttggg 1380 gtgctgggcc cgggggctgg gggcgcgcgc cgtggccccc gcgccccacgctgggcagtg 1440 cccggttcgg ccccgcatgg ccaggcctgc ccccggcctg cccgtctctcgggcccccca 1500 cccaccgcgg gacatcctag gtgtggacat ctcttgggca ctgagcgcccaggtggggtg 1560 ggccagggtc tgcacgggtg ccagggccct gggttctgta cgctcctgcagaaggagctc 1620 ttggagggca tggagtggcc aggcagtcac tcccccttgc cgacttcagagcaactgccc 1680 tgaaagcagg gcctgaggac ctctggctgt ggggctcagc tagctaaatgtgctgggtgg 1740 gtcactaggg agagacctgg gcttgagagg tagagtgtgg tgttgggggagtcaggtggc 1800 ttgcggccat tagagtcgca ggaccacact ccccaggaca gggcaggggccagcggtcca 1860 gtggctggag gtggcccgtg atgaaggcta caaacctacc cagccgcagccctgggaagg 1920 aagtgggctc tacagggcag ggcacctttt accctggagc tgcctgcttttgagggtaac 1980 agtcacgccc agccaagacc aggcctgggg cgttagtggg tgacctaggcactgcggggc 2040 gggggggctg ggtctacaca gcctgggtct gggcccaccg tccgttgtatgtctgctatg 2100 cgcagccaca gctgaactgc cctcccagac catctggagg ccgctgggggactctgggga 2160 ccaagactcc atgtgccaca gaggattggg ggcggggcgg tgctaggaactcaaagccag 2220 cctgggaaga ccctgtcctt gtcacccttt cttgccttgg gtctgtccactgagtagcac 2280 acaagaccgg gtgggcaggg tccgttctgc tccgggaatc acagactgtgtgtacccagg 2340 tggtgggcat gcagcgatca gtggcgtggg accacagagg gggcccgcggtacctaaaac 2400 agcttcacat ggcttaaaat aggggaccaa tgtcttttcc aatctaagtcccatttataa 2460 taaagtccat gttccatttt taaaggacaa tcctttcggt ttaaaaccaggcacgattac 2520 ccaaacaact cacaacggta aagcactgtg aatcttctct gttctgcaatcccaacttgg 2580 tttctgctca gaaaccctcc ctctttccaa tcggtaatta aataacaaaaggaaaaaact 2640 taagatgctt caaccccgtt tcgtgacact ttgaaaaaag aatcacctcttgcaaacacc 2700 cgctcccgac ccccgccgct gaagcccggc gtccagaggc ctaagcgcgggtgcccgccc 2760 ccacccggga gcgcgggcct cgtggtcagc gcatccgcgg ggagaaacaaaggccgcggc 2820 acgggggctc aagggcactg cgccacaccg cacgcgccta cccccgcgcggccacgttaa 2880 ctggcggtcg ccgcagcctc gggacagccg gccgcgcgcc gccaggctcgcggacgcggg 2940 accacgcgcc gccctccggg aggcccaagt ctcgacccag ccccgcgtggcgctggggga 3000 gggggcgcct ccgccggaac gcgggtgggg gaggggaggg ggaaatgcgctttgtctcga 3060 aatggggcaa ccgtcgccac agctccctac cccctcgagg gcagagcagtccccccacta 3120 actaccgggc tggccgcgcg ccaggccagc cgcgaggcca ccgcccgaccctccactcct 3180 tcccgcagct cccggcgcgg ggtccggcga gaaggggagg ggaggggagcggagaaccgg 3240 gcccccggga cgcgtgtggc atctgaagca ccaccagcga gcgagagctagagagaagga 3300 aagccaccga cttcaccgcc tccgagctgc tccgggtcgc gggtctgcagcgtctccggc 3360 cctccgcgcc tacagctcaa gccacatccg aagggggagg gagccgggagctgcgcgcgg 3420 ggccgccggg gggaggggtg gcaccgccca cgccgggcgg ccacgaagggcggggcagcg 3480 ggcgcgcgcg cggcgggggg aggggccggc gccgcgcccg ctgggaattggggccctagg 3540 gggagggcgg aggcgccgac gaccgcggca cttaccgttc gcggcgtggcgcccggtggt 3600 ccccaagggg agggaagggg gaggcggggc gaggacagtg accggagtctcctcagcggt 3660 ggcttttctg cttggcagcc tcagcggctg gcgccaaaac cggactccgcccacttcctc 3720 gcccgccggt gcgagggtgt ggaatcctcc agacgctggg ggagggggagttgggagctt 3780 aaaaactagt acccctttgg gaccactttc agcagcgaac tctcctgtacaccaggggtc 3840 agttccacag acgcgggcca ggggtgggtc attgcggcgt gaacaataatttgactagaa 3900 gttgattcgg gtgtttccgg aaggggccga gtcaatccgc cgagttggggcacggaaaac 3960 aaaaagggaa ggctactaag atttttctgg cgggggttat cattggcgtaactgcaggga 4020 ccacctcccg ggttgagggg gctggatctc caggctgcgg attaagcccctcccgtcggc 4080 gttaatttca aactgcgcga cgtttctcac ctgccttcgc caaggcaggggccgggaccc 4140 tattccaaga ggtagtaact agcaggactc tagccttccg caattcattgagcgcattta 4200 cggaagtaac gtcgggtact gtctctggcc gcaagggtgg gaggagtacgcatttggcgt 4260 aaggtggggc gtagagcctt cccgccattg gcggcggata gggcgtttacgcgacggcct 4320 gacgtagcgg aagacgcgtt agtggggggg aaggttctag aaaagcggcggcagcggctc 4380 tagcggcagt agcagcagcg ccgggtcccg tgcggaggtg ctcctcgcagagttgtttct 4440 cgagcagcgg cagttctcac tacagcgcca ggacgagtcc ggttcgtgttcgtccgcgga 4500 gatctctctc atctcgctcg gctgcgggaa atcgggctga agcgactgagtccgcgatgg 4560 aggtaacggg tttgaaatca atgagttatt gaaaagggca tggcgaggccgttggcgcct 4620 cagtggaagt cggccagccg cctccgtggg agagaggcag gaaatcggaccaattcagta 4680 gcagtggggc ttaaggttta tgaacggggt cttgagcgga ggcctgagcgtacaaacagc 4740 ttccccaccc tcagcctccc ggcgccattt cccttcactg ggggtgggggatggggagct 4800 ttcacatggc ggacgctgcc ccgctggggt gaaagtgggg cgcggaggcgggaattctta 4860 ttccctttct aaagcacgct gcttcggggg ccacggcgtc tcctcggcgagcgtttcggc 4920 gggcagcagg tcctcgtgag cgaggctgcg gagcttcccc tccccctctctcccgggaac 4980 cgatttggcg gccgccattt tcatggctcg ccttcctctc agcgttttccttataactct 5040 tttattttct tagtgtgctt tctctatcaa gaagtagaag tggttaactatttttttttt 5100 cttctcgggc tgttttcata tcgtttcgag gtggatttgg agtgttttgtgagcttggat 5160 ctttagagtc ctgcgcacct cattaaaggc gctcagcctt cccctcgatgaaatggcgcc 5220 attgcgttcg gaagccacac cgaagagcgg ggaggggggg tgctccgggtttgcgggccc 5280 ggtttcagag aagatatcac cacccagggc gtcgggccgg gttcaatgcgagccgtagga 5340 caaagaaacc attttatgtt tttcctgtct tttttttcct ttgagtaacggttttatctg 5400 ggtctgcagt cagtaaaacg acagatgaac cgcggcaaaa taaacataaattggaagcca 5460 tcggccacga ggggcaggga cgaaggtggt tttctgggcg ggggagggatattcgcgtca 5520 gaatccttta ctgttcttaa ggattccgtt taagttgtag agctgactcattttaagtaa 5580 tgttgttact gagaagttta acccttacgg gacagatcca tggacctttatagatgatta 5640 cgaggaaagt gaaataacga ttttgtcctt agttatactt cgattaaaacatggcttcag 5700 aggctccttc ctgtaatgcg tatggattga tgtgcaaaac tgttttgggcctgggccgct 5760 ctgtatttga actttgttac ttttctcatt ttgtttgcaa tcttggttgaacattacatt 5820 gataagcata aggtctcaag cgaagggggt ctacctggtt atttttctttgaccctaagc 5880 acgtttataa aataacattg tttaaaatcg atagtggaca tcgggtaagtttggataaat 5940 tgtgaggtaa gtaatgagtt tttgcttttt gttagtgatt tgtaaaacttgttataaatg 6000 tacattatcc gtaatttcag tttagagata acctatgtgc tgacgacaattaagaataaa 6060 aactagctga aaaaatgaaa ataactatcg tgacaagtaa ccatttcaaaagactgcttt 6120 gtgtctcata ggagctagtt tgatcatttc agttaatttt ttctttaatttttacgagtc 6180 atgaaaacta caggaaaaaa aatctgaact gggttttacc actactttttaggagttggg 6240 agcatgcgaa tggagggaga gctccgtaga actgggatga gagcagcaattaatgctgct 6300 tgctaggaac aaaaaataat tgattgaaaa ttacgtgtga ctttttagtttgcattatgc 6360 gtttgtagca gttggtcctg gatatcactt tctctcgttt gaggttttttaacctagtta 6420 acttttaaga caggtttcct taacattcat aagtgcccag aatacagctgtgtagtacag 6480 catataaaga tttcagctct gaggtttttc ctattgactt ggaaaattgttttgtgcctg 6540 tcgcttgcca catggccaat caagtaagct tcgaattcga gctcgcccaactccgcccgt 6600 tttatgacta gaaccaatag tttttaatgc caaatgcact gaaatcccctaatttgcaaa 6660 gccaaacgcc ccctatgtga gtaatacggg gactttttac ccaatttcccaagcggaaag 6720 ccccctaata cactcatatg gcatatgaat cagcacggtc atgcactctaatggcggccc 6780 atagggactt tccacatagg gggcgttcac catttcccag cataggggtggtgactcaat 6840 ggcctttacc caagtacatt gggtcaatgg gaggtaagcc aatgggtttttcccattact 6900 ggcaagcaca ctgagtcaaa tgggactttc cactgggttt tgcccaagtacattgggtca 6960 atgggaggtg agccaatggg aaaaacccat tgctgccaag tacactgactcaatagggac 7020 tttccaatgg gtttttccat tgttggcaag catataaggt caatgtgggtgagtcaatag 7080 ggactttcca ttgtattctg cccagtacat aaggtcaata gggggtgaatcaacaggaaa 7140 gtcccattgg agccaagtac actgcgtcaa tagggacttt ccattgggttttgcccagta 7200 cataaggtca ataggggatg agtcaatggg aaaaacccat tggagccaagtacactgact 7260 caatagggac tttccattgg gttttgccca gtacataagg tcaatagggggtgagtcaac 7320 aggaaagtcc cattggagcc aagtacattg agtcaatagg gactttccaatgggttttgc 7380 ccagtacata aggtcaatgg gaggtaagcc aatgggtttt tcccattactggcacgtata 7440 ctgagtcatt agggactttc caatgggttt tgcccagtac ataaggtcaataggggtgaa 7500 tcaacaggaa agtcccattg gagccaagta cactgagtca atagggactttccattgggt 7560 tttgcccagt acaaaaggtc aatagggggt gagtcaatgg gtttttcccattattggcac 7620 gtacataagg tcaatagggg tgagtcattg ggtttttcca gccaatttaattaaaacgcc 7680 atgtactttc ccaccattga cgtcaatggg ctattgaaac taatgcaacgtgacctttaa 7740 acggtacttt cccatagctg attaatggga aagtaccgtt ctcgagccaatacacgtcaa 7800 tgggaagtga aagggcagcc aaaacgtaac accgccccgg ttttcccctggaaattccat 7860 attggcacgc attctattgg ctgagctgcg ttctacgtgg gtataagaggcgcgaccagc 7920 gtcggtaccg tcgcagtctt cggtctgacc accgtagaac gcagagctcctcgctgcagc 7980 ccgggtctag aggatccgcc tgagaaagga agtgagctgt aaaggctgagctctctctct 8040 gacgtatgta gcctctggtt agcttcgtca ctcactgttc ttgactcagcatggcaatct 8100 gatgaaatcc cagctgtaag tctgcagaaa ttgatgatct attaaacaataaagatgtcc 8160 actaaaatgg aagtttttcc tgtcatactt tgttaagaag ggtgagaacagagtacctac 8220 attttgaatg gaaggattgg agctacgggg gtgggggtgg ggtgggattagataaatgcc 8280 tgctctttac tgaaggctct ttactattgc tttatgataa tgtttcatagttggatatca 8340 taatttaaac aagcaaaacc aaattaaggg ccagctcatt cctccagatccactagttct 8400 agagcaaatt ctaccgggta ggggaggcgc ttttcccaag gcagtctggagcatgcgctt 8460 tagcagcccc gctgggcact tggcgctaca caagtggcct ctggcctcgcacacattcca 8520 catccaccgg taggcgccaa ccggctccgt tctttggtgg ccccttcgcgccaccttcta 8580 ctcctcccct agtcaggaag ttcccccccg ccccgcagct cgcgtcgtgcaggacgtgac 8640 aaatggaagt agcacgtctc actagtctcg tgcagatgga cagcaccgctgagcaatgga 8700 agcgggtagg cctttggggc agcggccaat agcagctttg ctccttcgctttctgggctc 8760 agaggctggg aaggggtggg tccgggggcg ggctcagggg cgggctcaggggcggggcgg 8820 gcgcccgaag gtcctccgga ggcccggcat tctgcacgct tcaaaagcgcacgtctgccg 8880 cgctgttctc ctcttcctca tctccgggcc tttcgaccag cttaccatgaccgagtacaa 8940 gcccacggtg cgcctcgcca cccgcgacga cgtccccagg gccgtacgcaccctcgccgc 9000 cgcgttcgcc gactaccccg ccacgcgcca caccgtcgat ccggaccgccacatcgagcg 9060 ggtcaccgag ctgcaagaac tcttcctcac gcgcgtcggg ctcgacatcggcaaggtgtg 9120 ggtcgcggac gacggcgccg cggtggcggt ctggaccacg ccggagagcgtcgaagcggg 9180 ggcggtgttc gccgagatcg gcccgcgcat ggccgagttg agcggttcccggctggccgc 9240 gcagcaacag atggaaggcc tcctggcgcc gcaccggccc aaggagcccgcgtggttcct 9300 ggccaccgtc ggcgtctcgc ccgaccacca gggcaagggt ctgggcagcgccgtcgtgct 9360 ccccggagtg gaggcggccg agcgcgccgg ggtgcccgcc ttcctggagacctccgcgcc 9420 ccgcaacctc cccttctacg agcggctcgg cttcaccgtc accgccgacgtcgaggtgcc 9480 cgaaggaccg cgcacctggt gcatgacccg caagcccggt gcctgacgcccgccccacga 9540 cccgcagcgc ccgaccgaaa ggagcgcacg accccatgca taggttgggcttcggaatcg 9600 ttttccggga cgccggctgg atgatcctcc agcgcgggga tctcatgctggagttcttcg 9660 cccaccccaa cttgtttatt gcagcttata atggttacaa ataaagcaatagcatcacaa 9720 atttcacaaa taaagcattt ttttcactgc attctagttg tggtttgtccaaactcatca 9780 atgtatctta tcatgtctgt ataccgtcga gatctagagc ggccgccaccgcggtggagc 9840 tccagctttt gttcccttta gtgagggtta atttcgagct tggcgtaatcatggtcatag 9900 ctgtttcctg tgtgaaattg ttatccgctc acaattccac acaacatacgagccggaagc 9960 ataaagtgta aagcctgggg tgcctaatga gtgagctaac tcacattaattgcgttgcgc 10020 tcactgcccg ctttccagtc gggaaacctg tcgtgccagg gggtacctaggccgggcaac 10080 aattggcggc cggccgcact tttcggggaa atgtgcgcgg aacccctatttgtttatttt 10140 tctaaataca ttcaaatatg tatccgctca tgagacaata accctgataaatgcttcaat 10200 aatattgaaa aaggaagagt atgagtattc aacatttccg tgtcgcccttattccctttt 10260 ttgcggcatt ttgccttcct gtttttgctc acccagaaac gctggtgaaagtaaaagatg 10320 ctgaagatca gttgggtgca cgagtgggtt acatcgaact ggatctcaacagcggtaaga 10380 tccttgagag ttttcgcccc gaagaacgtt ttccaatgat gagcacttttaaagttctgc 10440 tatgtggcgc ggtattatcc cgtattgacg ccgggcaaga gcaactcggtcgccgcatac 10500 actattctca gaatgacttg gttgagtact caccagtcac agaaaagcatcttacggatg 10560 gcatgacagt aagagaatta tgcagtgctg ccataaccat gagtgataacactgcggcca 10620 acttacttct gacaacgatc ggaggaccga aggagctaac cgcttttttgcacaacatgg 10680 gggatcatgt aactcgcctt gatcgttggg aaccggagct gaatgaagccataccaaacg 10740 acgagcgtga caccacgatg cctgtagcaa tggcaacaac gttgcgcaaactattaactg 10800 gcgaactact tactctagct tcccggcaac aattaataga ctggatggaggcggataaag 10860 ttgcaggacc acttctgcgc tcggcccttc cggctggctg gtttattgctgataaatctg 10920 gagccggtga gcgtgggtct cgcggtatca ttgcagcact ggggccagatggtaagccct 10980 cccgtatcgt agttatctac acgacgggga gtcaggcaac tatggatgaacgaaatagac 11040 agatcgctga gataggtgcc tcactgatta agcattggta actgtcagaccctaggccgg 11100 gcaacaattg gcggccggcc ctgcattaat gaatcggcca acgcgcggggagaggcggtt 11160 tgcgtattgg gcgctcttcc gcttcctcgc tcactgactc gctgcgctcggtcgttcggc 11220 tgcggcgagc ggtatcagct cactcaaagg cggtaatacg gttatccacagaatcagggg 11280 ataacgcagg aaagaacatg tgagcaaaag gccagcaaaa ggccaggaaccgtaaaaagg 11340 ccgcgttgct ggcgtttttc cataggctcc gcccccctga cgagcatcacaaaaatcgac 11400 gctcaagtca gaggtggcga aacccgacag gactataaag ataccaggcgtttccccctg 11460 gaagctccct cgtgcgctct cctgttccga ccctgccgct taccggatacctgtccgcct 11520 ttctcccttc gggaagcgtg gcgctttctc atagctcacg ctgtaggtatctcagttcgg 11580 tgtaggtcgt tcgctccaag ctgggctgtg tgcacgaacc ccccgttcagcccgaccgct 11640 gcgccttatc cggtaactat cgtcttgagt ccaacccggt aagacacgacttatcgccac 11700 tggcagcagc cactggtaac aggattagca gagcgaggta tgtaggcggtgctacagagt 11760 tcttgaagtg gtggcctaac tacggctaca ctagaaggac agtatttggtatctgcgctc 11820 tgctgaagcc agttaccttc ggaaaaagag ttggtagctc ttgatccggcaaacaaacca 11880 ccgctggtag cggtggtttt tttgtttgca agcagcagat tacgcgcagaaaaaaaggat 11940 ctcaagaaga tcctttgatc ttttctacgg ggtctgacgc tcagtggaacgaaaactc 11998 29 12052 DNA Artificial Sequence Artificial Sequencecontaining human UCOE elements and vector sequence 29 acgttgtaaaacgacggcca gtgaattgta atacgactca ctatagggcg aattgggtac 60 cgggccccccctcgaggtcg agttggggtg gggaaaagga agaaacgcgg gcgtattggc 120 cccaatggggtctcggtggg gtatcgacag agtgccagcc ctgggaccga accccgcgtt 180 tatgaacaaacgacccaaca cccgtgcgtt ttattctgtc tttttattgc cgtcatagcg 240 cgggttccttccggtattgt ctccttccgt cgacggtatc aaggtggcga ccggaatggt 300 gagctgcgagaatagccggg cgcgctgtga gccgaagtcg cccccgccct ggccacttcc 360 ggcgcgccgagtccttaggc cgccaggggg cgccggcgcg cgcccagatt ggggacaaag 420 gaagccgggccggccgcgtt attaccataa aaggcaaaca ctggtcggag gcgtccccgc 480 ggcgcgcggcaggaagccag gccccaaccc cctcccaacc gggcgccagc cccgcctccg 540 cccggttcaaacagcgaccg ggtcgcgcgc gcgcacgcag cggccacacc ctcgggcgcc 600 agcggctcgggcaggaagtg gcgcaagcgc ccgggcccca gaacgcacgc gcgattagcg 660 ccattgagtcccagcgcgca cgcgcaatta gcgccaattc ccagcgcgca cgcagttagc 720 gcccaaaggaccagcgcgca cgcgcatggc gccccagccc ccaccgggcc tgacgggggc 780 tacgccgcgcccaccgtgcg atccccattg gcaagagccc ggctcagaca aagaccccgc 840 cggttgcccccgccccgaga gcggcacccc cggagcgcgc ccgcccgagc gcggcctcgc 900 gcctgcgaactggcgtgggg tgtcccccat ctccggaggc ccaggggctt ctcccgcgcc 960 ccccacggcggtccggttcc gccccatgcg ccccccgctg cggcccagac ggcggctctg 1020 cacgggcgaagggccgcggc cgcatgcccc ggtcggctgg ccgggcttac ctggcggcgg 1080 gtgtggacgggcggcggatc ggcaaaggcg aggctctgtg ctcgcgggcg gacgcggtct 1140 cggcggtggtggcgcgtcgc gccgctgggt tttatagggc gccgccgcgg ccgctcgagc 1200 cataaaaggcaactttcgga acggcgcacg ctgattggcc ccgcgccgct cactcaccgg 1260 cttcgccgcacagtgcagca tttttttacc ccctctcccc tccttttgcg aaaaaaaaaa 1320 agagcgagagcgagattgag gaagaggagg agggagagtt ttggcgttgg ccgccttggg 1380 gtgctgggcccgggggctgg gggcgcgcgc cgtggccccc gcgccccacg ctgggcagtg 1440 cccggttcggccccgcatgg ccaggcctgc ccccggcctg cccgtctctc gggcccccca 1500 cccaccgcgggacatcctag gtgtggacat ctcttgggca ctgagcgccc aggtggggtg 1560 ggccagggtctgcacgggtg ccagggccct gggttctgta cgctcctgca gaaggagctc 1620 ttggagggcatggagtggcc aggcagtcac tcccccttgc cgacttcaga gcaactgccc 1680 tgaaagcagggcctgaggac ctctggctgt ggggctcagc tagctaaatg tgctgggtgg 1740 gtcactagggagagacctgg gcttgagagg tagagtgtgg tgttggggga gtcaggtggc 1800 ttgcggccattagagtcgca ggaccacact ccccaggaca gggcaggggc cagcggtcca 1860 gtggctggaggtggcccgtg atgaaggcta caaacctacc cagccgcagc cctgggaagg 1920 aagtgggctctacagggcag ggcacctttt accctggagc tgcctgcttt tgagggtaac 1980 agtcacgcccagccaagacc aggcctgggg cgttagtggg tgacctaggc actgcggggc 2040 gggggggctgggtctacaca gcctgggtct gggcccaccg tccgttgtat gtctgctatg 2100 cgcagccacagctgaactgc cctcccagac catctggagg ccgctggggg actctgggga 2160 ccaagactccatgtgccaca gaggattggg ggcggggcgg tgctaggaac tcaaagccag 2220 cctgggaagaccctgtcctt gtcacccttt cttgccttgg gtctgtccac tgagtagcac 2280 acaagaccgggtgggcaggg tccgttctgc tccgggaatc acagactgtg tgtacccagg 2340 tggtgggcatgcagcgatca gtggcgtggg accacagagg gggcccgcgg taccaagctt 2400 gggaattgcgtgcaaaaaca acttctgttt tccagggtaa acagaatcta atgcagaatc 2460 taatgcagggtaaacagact taatgcagaa tctaatgatg gcacaaatta aaaatcacta 2520 acgtgccctttttagtgtga aacccagaga gagcacatac aagccaaaaa caaatgcttt 2580 attttacctaggagacatta acattcacct ttacgtgttt aagattaatg caatgttaaa 2640 tattgtgaaaactgtaactt tgaatttcat gatttttatg tgaatattcc agggtttaaa 2700 aaaacttgtaacatgacatg gctgaataag ataaaaaaaa aatctagcct tttctccctt 2760 ctggctcatatttgcgattt cgatcatttt gtttaaaaaa caaaacactg caatgaatta 2820 aacttaatattcttctatgt tttagagtaa gttaaaacaa gataaagtga ccaaagtaat 2880 ttgaaagattcaatgacttt tgctccaacc taggtgcaca aggtaccttg ttctttaaat 2940 tgggctttaatgaaaatact tctccagaat tctggggatt taagaaaaat tatgccaacc 3000 aacaagggctttaccatttt atgtaacatt tttcaacgct gcaaaaatgt gtgtatttct 3060 atttgaagataaaaatcctc agcaaaatcc acattgcact gtccttcaaa gattagcctt 3120 ctttgaactagttaagacac tattaagcca agccagtatc tccctgtaat gaattcgttt 3180 ttctcttaattttcccctgt aatttacact gggagagctg ggaaatatgt ggatgtaaat 3240 ttctcagccacagagatgca aagttatact gtggggaaaa aaaacttgag ttaaatcctt 3300 acatattttaggttttcatt aacttaccaa tgtagttttg ttggaggcca ttttttttat 3360 tgcagacttgaagagctatt actagaaaaa tgcatgacag ttaaggtaag tttgcatgac 3420 acaaaaaaggtaactaaata caaattctgt ttggattcca acccccaagt agagagcgca 3480 cactttcaaacgtgaataca aatccagagt agatctgcgc tcctacctac attgcttatg 3540 atgtacttaagtacgtgtcc taaccatgtg agtctagaaa gactttactg gggatcctgg 3600 tacctaaaacagcttcacat ggcttaaaat aggggaccaa tgtcttttcc aatctaagtc 3660 ccatttataataaagtccat gttccatttt taaaggacaa tcctttcggt ttaaaaccag 3720 gcacgattacccaaacaact cacaacggta aagcactgtg aatcttctct gttctgcaat 3780 cccaacttggtttctgctca gaaaccctcc ctctttccaa tcggtaatta aataacaaaa 3840 ggaaaaaacttaagatgctt caaccccgtt tcgtgacact ttgaaaaaag aatcacctct 3900 tgcaaacacccgctcccgac ccccgccgct gaagcccggc gtccagaggc ctaagcgcgg 3960 gtgcccgcccccacccggga gcgcgggcct cgtggtcagc gcatccgcgg ggagaaacaa 4020 aggccgcggcacgggggctc aagggcactg cgccacaccg cacgcgccta cccccgcgcg 4080 gccacgttaactggcggtcg ccgcagcctc gggacagccg gccgcgcgcc gccaggctcg 4140 cggacgcgggaccacgcgcc gccctccggg aggcccaagt ctcgacccag ccccgcgtgg 4200 cgctgggggagggggcgcct ccgccggaac gcgggtgggg gaggggaggg ggaaatgcgc 4260 tttgtctcgaaatggggcaa ccgtcgccac agctccctac cccctcgagg gcagagcagt 4320 ccccccactaactaccgggc tggccgcgcg ccaggccagc cgcgaggcca ccgcccgacc 4380 ctccactccttcccgcagct cccggcgcgg ggtccggcga gaaggggagg ggaggggagc 4440 ggagaaccgggcccccggga cgcgtgtggc atctgaagca ccaccagcga gcgagagcta 4500 gagagaaggaaagccaccga cttcaccgcc tccgagctgc tccgggtcgc gggtctgcag 4560 cgtctccggccctccgcgcc tacagctcaa gccacatccg aagggggagg gagccgggag 4620 ctgcgcgcggggccgccggg gggaggggtg gcaccgccca cgccgggcgg ccacgaaggg 4680 cggggcagcgggcgcgcgcg cggcgggggg aggggccggc gccgcgcccg ctgggaattg 4740 gggccctagggggagggcgg aggcgccgac gaccgcggca cttaccgttc gcggcgtggc 4800 gcccggtggtccccaagggg agggaagggg gaggcggggc gaggacagtg accggagtct 4860 cctcagcggtggcttttctg cttggcagcc tcagcggctg gcgccaaaac cggactccgc 4920 ccacttcctcgcccgccggt gcgagggtgt ggaatcctcc agacgctggg ggagggggag 4980 ttgggagcttaaaaactagt acccctttgg gaccactttc agcagcgaac tctcctgtac 5040 accaggggtcagttccacag acgcgggcca ggggtgggtc attgcggcgt gaacaataat 5100 ttgactagaagttgattcgg gtgtttccgg aaggggccga gtcaatccgc cgagttgggg 5160 cacggaaaacaaaaagggaa ggctactaag atttttctgg cgggggttat cattggcgta 5220 actgcagggaccacctcccg ggttgagggg gctggatctc caggctgcgg attaagcccc 5280 tcccgtcggcgttaatttca aactgcgcga cgtttctcac ctgccttcgc caaggcaggg 5340 gccgggaccctattccaaga ggtagtaact agcaggactc tagccttccg caattcattg 5400 agcgcatttacggaagtaac gtcgggtact gtctctggcc gcaagggtgg gaggagtacg 5460 catttggcgtaaggtggggc gtagagcctt cccgccattg gcggcggata gggcgtttac 5520 gcgacggcctgacgtagcgg aagacgcgtt agtggggggg aaggttctag aaaagcggcg 5580 gcagcggctctagcggcagt agcagcagcg ccgggtcccg tgcggaggtg ctcctcgcag 5640 agttgtttctcgagcagcgg cagttctcac tacagcgcca ggacgagtcc ggttcgtgtt 5700 cgtccgcggagatctctctc atctcgctcg gctgcgggaa atcgggctga agcgactgag 5760 tccgcgatggaggtaacggg tttgaaatca atgagttatt gaaaagggca tggcgaggcc 5820 gttggcgcctcagtggaagt cggccagccg cctccgtggg agagaggcag gaaatcggac 5880 caattcagtagcagtggggc ttaaggttta tgaacggggt cttgagcgga ggcctgagcg 5940 tacaaacagcttccccaccc tcagcctccc ggcgccattt cccttcactg ggggtggggg 6000 atggggagctttcacatggc ggacgctgcc ccgctggggt gaaagtgggg cgcggaggcg 6060 ggaattcttattccctttct aaagcacgct gcttcggggg ccacggcgtc tcctcggcga 6120 gcgtttcggcgggcagcagg tcctcgtgag cgaggctgcg gagcttcccc tccccctctc 6180 tcccgggaaccgatttggcg gccgccattt tcatggctcg ccttcctctc agcgttttcc 6240 ttataactcttttattttct tagtgtgctt tctctatcaa gaagtagaag tggttaacta 6300 ttttttttttcttctcgggc tgttttcata tcgtttcgag gtggatttgg agtgttttgt 6360 gagcttggatctttagagtc ctgcgcacct cattaaaggc gctcagcctt cccctcgatg 6420 aaatggcgccattgcgttcg gaagccacac cgaagagcgg ggaggggggg tgctccgggt 6480 ttgcgggcccggtttcagag aagatcccaa gcttcgaatt cgagctcgcc caactccgcc 6540 cgttttatgactagaaccaa tagtttttaa tgccaaatgc actgaaatcc cctaatttgc 6600 aaagccaaacgccccctatg tgagtaatac ggggactttt tacccaattt cccaagcgga 6660 aagccccctaatacactcat atggcatatg aatcagcacg gtcatgcact ctaatggcgg 6720 cccatagggactttccacat agggggcgtt caccatttcc cagcataggg gtggtgactc 6780 aatggcctttacccaagtac attgggtcaa tgggaggtaa gccaatgggt ttttcccatt 6840 actggcaagcacactgagtc aaatgggact ttccactggg ttttgcccaa gtacattggg 6900 tcaatgggaggtgagccaat gggaaaaacc cattgctgcc aagtacactg actcaatagg 6960 gactttccaatgggtttttc cattgttggc aagcatataa ggtcaatgtg ggtgagtcaa 7020 tagggactttccattgtatt ctgcccagta cataaggtca atagggggtg aatcaacagg 7080 aaagtcccattggagccaag tacactgcgt caatagggac tttccattgg gttttgccca 7140 gtacataaggtcaatagggg atgagtcaat gggaaaaacc cattggagcc aagtacactg 7200 actcaatagggactttccat tgggttttgc ccagtacata aggtcaatag ggggtgagtc 7260 aacaggaaagtcccattgga gccaagtaca ttgagtcaat agggactttc caatgggttt 7320 tgcccagtacataaggtcaa tgggaggtaa gccaatgggt ttttcccatt actggcacgt 7380 atactgagtcattagggact ttccaatggg ttttgcccag tacataaggt caataggggt 7440 gaatcaacaggaaagtccca ttggagccaa gtacactgag tcaataggga ctttccattg 7500 ggttttgcccagtacaaaag gtcaataggg ggtgagtcaa tgggtttttc ccattattgg 7560 cacgtacataaggtcaatag gggtgagtca ttgggttttt ccagccaatt taattaaaac 7620 gccatgtactttcccaccat tgacgtcaat gggctattga aactaatgca acgtgacctt 7680 taaacggtactttcccatag ctgattaatg ggaaagtacc gttctcgagc caatacacgt 7740 caatgggaagtgaaagggca gccaaaacgt aacaccgccc cggttttccc ctggaaattc 7800 catattggcacgcattctat tggctgagct gcgttctacg tgggtataag aggcgcgacc 7860 agcgtcggtaccgtcgcagt cttcggtctg accaccgtag aacgcagagc tcctcgctgc 7920 agcccgggtctagaggatcc gcctgagaaa ggaagtgagc tgtaaaggct gagctctctc 7980 tctgacgtatgtagcctctg gttagcttcg tcactcactg ttcttgactc agcatggcaa 8040 tctgatgaaatcccagctgt aagtctgcag aaattgatga tctattaaac aataaagatg 8100 tccactaaaatggaagtttt tcctgtcata ctttgttaag aagggtgaga acagagtacc 8160 tacattttgaatggaaggat tggagctacg ggggtggggg tggggtggga ttagataaat 8220 gcctgctctttactgaaggc tctttactat tgctttatga taatgtttca tagttggata 8280 tcataatttaaacaagcaaa accaaattaa gggccagctc attcctccag atccactagt 8340 aattctgtggaatgtgtgtc agttagggtg tggaaagtcc ccaggctccc cagcaggcag 8400 aagtatgcaaagcatgcatc tcaattagtc agcaaccagg tgtggaaagt ccccaggctc 8460 cccagcaggcagaagtatgc aaagcatgca tctcaattag tcagcaacca tagtcccgcc 8520 cctaactccgcccatcccgc ccctaactcc gcccagttcc gcccattctc cgccccatgg 8580 ctgactaattttttttattt atgcagaggc cgaggccgcc tctgcctctg agctattcca 8640 gaagtagtgaggaggctttt ttggaggcct aggcttttgc aaaaagctcc cgggagcttg 8700 tatatccattttcggatctg atcaagagac aggatgagga tcgtttcgca tgattgaaca 8760 agatggattgcacgcaggtt ctccggccgc ttgggtggag aggctattcg gctatgactg 8820 ggcacaacagacaatcggct gctctgatgc cgccgtgttc cggctgtcag cgcaggggcg 8880 cccggttctttttgtcaaga ccgacctgtc cggtgccctg aatgaactgc aggacgaggc 8940 agcgcggctatcstggctgg ccacgacggg cgttccttgc gcagctgtgc tcgacgttgt 9000 cactgaagcgggaagggact ggctgctatt gggcgaagtg ccggggcagg atctcctgtc 9060 atctcaccttgctcctgccg agaaagtatc catcatggct gatgcaatgc ggcggctgca 9120 tacgcttgatccggctacct gcccattcga ccaccaagcg aaacatcgca tcgagcgagc 9180 acgtactcggatggaagccg gtcttgtcga tcaggatgat ctggacgaag agcatcaggg 9240 gctcgcgccagccgaactgt tcgccaggct caaggcgcgc atgcccgacg gcgaggatct 9300 cgtcgtgacccatggcgatg cctgcttgcc gaatatcatg gtggaaaatg gccgcttttc 9360 tggattcatcgactgtggcc ggctgggtgt ggcggaccgc tatcaggaca tagcgttggc 9420 tacccgtgatattgctgaag agcttggcgg cgaatgggct gaccgcttcc tcgtgcttta 9480 cggtatcgccgctcccgatt cgcagcgcat cgccttctat cgccttcttg acgagttctt 9540 ctgagcgggactctggggtt cgaaatgacc gaccaagcga cgcccaacct gccatcacga 9600 gatttcgattccaccgccgc cttctatgaa aggttgggct tcggaatcgt tttccgggac 9660 gccggctggatgatcctcca gcgcggggat ctcatgctgg agttcttcgc ccaccccaac 9720 ttgtttattgcagcttataa tggttacaaa taaagcaata gcatcacaaa tttcacaaat 9780 aaagcatttttttcactgca ttctagttgt ggtttgtcca aactcatcaa tgtatcttat 9840 catgtctgtataccgtcgag actagttcta gagcggccgc caccgcggtg gagctccagc 9900 ttttgttccctttagtgagg gttaatttcg agcttggcgt aatcatggtc atagctgttt 9960 cctgtgtgaaattgttatcc gctcacaatt ccacacaaca tacgagccgg aagcataaag 10020 tgtaaagcctggggtgccta atgagtgagc taactcacat taattgcgtt gcgctcactg 10080 cccgctttccagtcgggaaa cctgtcgtgc cagggggtac ctaggccggg caacaattgg 10140 cggccggccgcacttttcgg ggaaatgtgc gcggaacccc tatttgttta tttttctaaa 10200 tacattcaaatatgtatccg ctcatgagac aataaccctg ataaatgctt caataatatt 10260 gaaaaaggaagagtatgagt attcaacatt tccgtgtcgc ccttattccc ttttttgcgg 10320 cattttgccttcctgttttt gctcacccag aaacgctggt gaaagtaaaa gatgctgaag 10380 atcagttgggtgcacgagtg ggttacatcg aactggatct caacagcggt aagatccttg 10440 agagttttcgccccgaagaa cgttttccaa tgatgagcac ttttaaagtt ctgctatgtg 10500 gcgcggtattatcccgtatt gacgccgggc aagagcaact cggtcgccgc atacactatt 10560 ctcagaatgacttggttgag tactcaccag tcacagaaaa gcatcttacg gatggcatga 10620 cagtaagagaattatgcagt gctgccataa ccatgagtga taacactgcg gccaacttac 10680 ttctgacaacgatcggagga ccgaaggagc taaccgcttt tttgcacaac atgggggatc 10740 atgtaactcgccttgatcgt tgggaaccgg agctgaatga agccatacca aacgacgagc 10800 gtgacaccacgatgcctgta gcaatggcaa caacgttgcg caaactatta actggcgaac 10860 tacttactctagcttcccgg caacaattaa tagactggat ggaggcggat aaagttgcag 10920 gaccacttctgcgctcggcc cttccggctg gctggtttat tgctgataaa tctggagccg 10980 gtgagcgtgggtctcgcggt atcattgcag cactggggcc agatggtaag ccctcccgta 11040 tcgtagttatctacacgacg gggagtcagg caactatgga tgaacgaaat agacagatcg 11100 ctgagataggtgcctcactg attaagcatt ggtaactgtc agaccctagg ccgggcaaca 11160 attggcggccggccctgcat taatgaatcg gccaacgcgc ggggagaggc ggtttgcgta 11220 ttgggcgctcttccgcttcc tcgctcactg actcgctgcg ctcggtcgtt cggctgcggc 11280 gagcggtatcagctcactca aaggcggtaa tacggttatc cacagaatca ggggataacg 11340 caggaaagaacatgtgagca aaaggccagc aaaaggccag gaaccgtaaa aaggccgcgt 11400 tgctggcgtttttccatagg ctccgccccc ctgacgagca tcacaaaaat cgacgctcaa 11460 gtcagaggtggcgaaacccg acaggactat aaagatacca ggcgtttccc cctggaagct 11520 ccctcgtgcgctctcctgtt ccgaccctgc cgcttaccgg atacctgtcc gcctttctcc 11580 cttcgggaagcgtggcgctt tctcatagct cacgctgtag gtatctcagt tcggtgtagg 11640 tcgttcgctccaagctgggc tgtgtgcacg aaccccccgt tcagcccgac cgctgcgcct 11700 tatccggtaactatcgtctt gagtccaacc cggtaagaca cgacttatcg ccactggcag 11760 cagccactggtaacaggatt agcagagcga ggtatgtagg cggtgctaca gagttcttga 11820 agtggtggcctaactacggc tacactagaa ggacagtatt tggtatctgc gctctgctga 11880 agccagttaccttcggaaaa agagttggta gctcttgatc cggcaaacaa accaccgctg 11940 gtagcggtggtttttttgtt tgcaagcagc agattacgcg cagaaaaaaa ggatctcaag 12000 aagatcctttgatcttttct acggggtctg acgctcagtg gaacgaaaac tc 12052 30 11941 DNAArtificial Sequence Artificial Sequence containing human UCOE elementsand vector sequence 30 acgttgtaaa acgacggcca gtgaattgta atacgactcactatagggcg aattgggtac 60 cgggcccccc ctcgaggtcg agttggggtg gggaaaaggaagaaacgcgg gcgtattggc 120 cccaatgggg tctcggtggg gtatcgacag agtgccagccctgggaccga accccgcgtt 180 tatgaacaaa cgacccaaca cccgtgcgtt ttattctgtctttttattgc cgtcatagcg 240 cgggttcctt ccggtattgt ctccttccgt cgacggtatcaaggtggcga ccggaatggt 300 gagctgcgag aatagccggg cgcgctgtga gccgaagtcgcccccgccct ggccacttcc 360 ggcgcgccga gtccttaggc cgccaggggg cgccggcgcgcgcccagatt ggggacaaag 420 gaagccgggc cggccgcgtt attaccataa aaggcaaacactggtcggag gcgtccccgc 480 ggcgcgcggc aggaagccag gccccaaccc cctcccaaccgggcgccagc cccgcctccg 540 cccggttcaa acagcgaccg ggtcgcgcgc gcgcacgcagcggccacacc ctcgggcgcc 600 agcggctcgg gcaggaagtg gcgcaagcgc ccgggccccagaacgcacgc gcgattagcg 660 ccattgagtc ccagcgcgca cgcgcaatta gcgccaattcccagcgcgca cgcagttagc 720 gcccaaagga ccagcgcgca cgcgcatggc gccccagcccccaccgggcc tgacgggggc 780 tacgccgcgc ccaccgtgcg atccccattg gcaagagcccggctcagaca aagaccccgc 840 cggttgcccc cgccccgaga gcggcacccc cggagcgcgcccgcccgagc gcggcctcgc 900 gcctgcgaac tggcgtgggg tgtcccccat ctccggaggcccaggggctt ctcccgcgcc 960 ccccacggcg gtccggttcc gccccatgcg ccccccgctgcggcccagac ggcggctctg 1020 cacgggcgaa gggccgcggc cgcatgcccc ggtcggctggccgggcttac ctggcggcgg 1080 gtgtggacgg gcggcggatc ggcaaaggcg aggctctgtgctcgcgggcg gacgcggtct 1140 cggcggtggt ggcgcgtcgc gccgctgggt tttatagggcgccgccgcgg ccgctcgagc 1200 cataaaaggc aactttcgga acggcgcacg ctgattggccccgcgccgct cactcaccgg 1260 cttcgccgca cagtgcagca tttttttacc ccctctcccctccttttgcg aaaaaaaaaa 1320 agagcgagag cgagattgag gaagaggagg agggagagttttggcgttgg ccgccttggg 1380 gtgctgggcc cgggggctgg gggcgcgcgc cgtggcccccgcgccccacg ctgggcagtg 1440 cccggttcgg ccccgcatgg ccaggcctgc ccccggcctgcccgtctctc gggcccccca 1500 cccaccgcgg gacatcctag gtgtggacat ctcttgggcactgagcgccc aggtggggtg 1560 ggccagggtc tgcacgggtg ccagggccct gggttctgtacgctcctgca gaaggagctc 1620 ttggagggca tggagtggcc aggcagtcac tcccccttgccgacttcaga gcaactgccc 1680 tgaaagcagg gcctgaggac ctctggctgt ggggctcagctagctaaatg tgctgggtgg 1740 gtcactaggg agagacctgg gcttgagagg tagagtgtggtgttggggga gtcaggtggc 1800 ttgcggccat tagagtcgca ggaccacact ccccaggacagggcaggggc cagcggtcca 1860 gtggctggag gtggcccgtg atgaaggcta caaacctacccagccgcagc cctgggaagg 1920 aagtgggctc tacagggcag ggcacctttt accctggagctgcctgcttt tgagggtaac 1980 agtcacgccc agccaagacc aggcctgggg cgttagtgggtgacctaggc actgcggggc 2040 gggggggctg ggtctacaca gcctgggtct gggcccaccgtccgttgtat gtctgctatg 2100 cgcagccaca gctgaactgc cctcccagac catctggaggccgctggggg actctgggga 2160 ccaagactcc atgtgccaca gaggattggg ggcggggcggtgctaggaac tcaaagccag 2220 cctgggaaga ccctgtcctt gtcacccttt cttgccttgggtctgtccac tgagtagcac 2280 acaagaccgg gtgggcaggg tccgttctgc tccgggaatcacagactgtg tgtacccagg 2340 tggtgggcat gcagcgatca gtggcgtggg accacagagggggcccgcgg taccaagctt 2400 gggaattgcg tgcaaaaaca acttctgttt tccagggtaaacagaatcta atgcagaatc 2460 taatgcaggg taaacagact taatgcagaa tctaatgatggcacaaatta aaaatcacta 2520 acgtgccctt tttagtgtga aacccagaga gagcacatacaagccaaaaa caaatgcttt 2580 attttaccta ggagacatta acattcacct ttacgtgtttaagattaatg caatgttaaa 2640 tattgtgaaa actgtaactt tgaatttcat gatttttatgtgaatattcc agggtttaaa 2700 aaaacttgta acatgacatg gctgaataag ataaaaaaaaaatctagcct tttctccctt 2760 ctggctcata tttgcgattt cgatcatttt gtttaaaaaacaaaacactg caatgaatta 2820 aacttaatat tcttctatgt tttagagtaa gttaaaacaagataaagtga ccaaagtaat 2880 ttgaaagatt caatgacttt tgctccaacc taggtgcacaaggtaccttg ttctttaaat 2940 tgggctttaa tgaaaatact tctccagaat tctggggatttaagaaaaat tatgccaacc 3000 aacaagggct ttaccatttt atgtaacatt tttcaacgctgcaaaaatgt gtgtatttct 3060 atttgaagat aaaaatcctc agcaaaatcc acattgcactgtccttcaaa gattagcctt 3120 ctttgaacta gttaagacac tattaagcca agccagtatctccctgtaat gaattcgttt 3180 ttctcttaat tttcccctgt aatttacact gggagagctgggaaatatgt ggatgtaaat 3240 ttctcagcca cagagatgca aagttatact gtggggaaaaaaaacttgag ttaaatcctt 3300 acatatttta ggttttcatt aacttaccaa tgtagttttgttggaggcca ttttttttat 3360 tgcagacttg aagagctatt actagaaaaa tgcatgacagttaaggtaag tttgcatgac 3420 acaaaaaagg taactaaata caaattctgt ttggattccaacccccaagt agagagcgca 3480 cactttcaaa cgtgaataca aatccagagt agatctgcgctcctacctac attgcttatg 3540 atgtacttaa gtacgtgtcc taaccatgtg agtctagaaagactttactg gggatcctgg 3600 tacctaaaac agcttcacat ggcttaaaat aggggaccaatgtcttttcc aatctaagtc 3660 ccatttataa taaagtccat gttccatttt taaaggacaatcctttcggt ttaaaaccag 3720 gcacgattac ccaaacaact cacaacggta aagcactgtgaatcttctct gttctgcaat 3780 cccaacttgg tttctgctca gaaaccctcc ctctttccaatcggtaatta aataacaaaa 3840 ggaaaaaact taagatgctt caaccccgtt tcgtgacactttgaaaaaag aatcacctct 3900 tgcaaacacc cgctcccgac ccccgccgct gaagcccggcgtccagaggc ctaagcgcgg 3960 gtgcccgccc ccacccggga gcgcgggcct cgtggtcagcgcatccgcgg ggagaaacaa 4020 aggccgcggc acgggggctc aagggcactg cgccacaccgcacgcgccta cccccgcgcg 4080 gccacgttaa ctggcggtcg ccgcagcctc gggacagccggccgcgcgcc gccaggctcg 4140 cggacgcggg accacgcgcc gccctccggg aggcccaagtctcgacccag ccccgcgtgg 4200 cgctggggga gggggcgcct ccgccggaac gcgggtgggggaggggaggg ggaaatgcgc 4260 tttgtctcga aatggggcaa ccgtcgccac agctccctaccccctcgagg gcagagcagt 4320 ccccccacta actaccgggc tggccgcgcg ccaggccagccgcgaggcca ccgcccgacc 4380 ctccactcct tcccgcagct cccggcgcgg ggtccggcgagaaggggagg ggaggggagc 4440 ggagaaccgg gcccccggga cgcgtgtggc atctgaagcaccaccagcga gcgagagcta 4500 gagagaagga aagccaccga cttcaccgcc tccgagctgctccgggtcgc gggtctgcag 4560 cgtctccggc cctccgcgcc tacagctcaa gccacatccgaagggggagg gagccgggag 4620 ctgcgcgcgg ggccgccggg gggaggggtg gcaccgcccacgccgggcgg ccacgaaggg 4680 cggggcagcg ggcgcgcgcg cggcgggggg aggggccggcgccgcgcccg ctgggaattg 4740 gggccctagg gggagggcgg aggcgccgac gaccgcggcacttaccgttc gcggcgtggc 4800 gcccggtggt ccccaagggg agggaagggg gaggcggggcgaggacagtg accggagtct 4860 cctcagcggt ggcttttctg cttggcagcc tcagcggctggcgccaaaac cggactccgc 4920 ccacttcctc gcccgccggt gcgagggtgt ggaatcctccagacgctggg ggagggggag 4980 ttgggagctt aaaaactagt acccctttgg gaccactttcagcagcgaac tctcctgtac 5040 accaggggtc agttccacag acgcgggcca ggggtgggtcattgcggcgt gaacaataat 5100 ttgactagaa gttgattcgg gtgtttccgg aaggggccgagtcaatccgc cgagttgggg 5160 cacggaaaac aaaaagggaa ggctactaag atttttctggcgggggttat cattggcgta 5220 actgcaggga ccacctcccg ggttgagggg gctggatctccaggctgcgg attaagcccc 5280 tcccgtcggc gttaatttca aactgcgcga cgtttctcacctgccttcgc caaggcaggg 5340 gccgggaccc tattccaaga ggtagtaact agcaggactctagccttccg caattcattg 5400 agcgcattta cggaagtaac gtcgggtact gtctctggccgcaagggtgg gaggagtacg 5460 catttggcgt aaggtggggc gtagagcctt cccgccattggcggcggata gggcgtttac 5520 gcgacggcct gacgtagcgg aagacgcgtt agtgggggggaaggttctag aaaagcggcg 5580 gcagcggctc tagcggcagt agcagcagcg ccgggtcccgtgcggaggtg ctcctcgcag 5640 agttgtttct cgagcagcgg cagttctcac tacagcgccaggacgagtcc ggttcgtgtt 5700 cgtccgcgga gatctctctc atctcgctcg gctgcgggaaatcgggctga agcgactgag 5760 tccgcgatgg aggtaacggg tttgaaatca atgagttattgaaaagggca tggcgaggcc 5820 gttggcgcct cagtggaagt cggccagccg cctccgtgggagagaggcag gaaatcggac 5880 caattcagta gcagtggggc ttaaggttta tgaacggggtcttgagcgga ggcctgagcg 5940 tacaaacagc ttccccaccc tcagcctccc ggcgccatttcccttcactg ggggtggggg 6000 atggggagct ttcacatggc ggacgctgcc ccgctggggtgaaagtgggg cgcggaggcg 6060 ggaattctta ttccctttct aaagcacgct gcttcgggggccacggcgtc tcctcggcga 6120 gcgtttcggc gggcagcagg tcctcgtgag cgaggctgcggagcttcccc tccccctctc 6180 tcccgggaac cgatttggcg gccgccattt tcatggctcgccttcctctc agcgttttcc 6240 ttataactct tttattttct tagtgtgctt tctctatcaagaagtagaag tggttaacta 6300 tttttttttt cttctcgggc tgttttcata tcgtttcgaggtggatttgg agtgttttgt 6360 gagcttggat ctttagagtc ctgcgcacct cattaaaggcgctcagcctt cccctcgatg 6420 aaatggcgcc attgcgttcg gaagccacac cgaagagcggggaggggggg tgctccgggt 6480 ttgcgggccc ggtttcagag aagatcccaa gcttcgaattcgagctcgcc caactccgcc 6540 cgttttatga ctagaaccaa tagtttttaa tgccaaatgcactgaaatcc cctaatttgc 6600 aaagccaaac gccccctatg tgagtaatac ggggactttttacccaattt cccaagcgga 6660 aagcccccta atacactcat atggcatatg aatcagcacggtcatgcact ctaatggcgg 6720 cccataggga ctttccacat agggggcgtt caccatttcccagcataggg gtggtgactc 6780 aatggccttt acccaagtac attgggtcaa tgggaggtaagccaatgggt ttttcccatt 6840 actggcaagc acactgagtc aaatgggact ttccactgggttttgcccaa gtacattggg 6900 tcaatgggag gtgagccaat gggaaaaacc cattgctgccaagtacactg actcaatagg 6960 gactttccaa tgggtttttc cattgttggc aagcatataaggtcaatgtg ggtgagtcaa 7020 tagggacttt ccattgtatt ctgcccagta cataaggtcaatagggggtg aatcaacagg 7080 aaagtcccat tggagccaag tacactgcgt caatagggactttccattgg gttttgccca 7140 gtacataagg tcaatagggg atgagtcaat gggaaaaacccattggagcc aagtacactg 7200 actcaatagg gactttccat tgggttttgc ccagtacataaggtcaatag ggggtgagtc 7260 aacaggaaag tcccattgga gccaagtaca ttgagtcaatagggactttc caatgggttt 7320 tgcccagtac ataaggtcaa tgggaggtaa gccaatgggtttttcccatt actggcacgt 7380 atactgagtc attagggact ttccaatggg ttttgcccagtacataaggt caataggggt 7440 gaatcaacag gaaagtccca ttggagccaa gtacactgagtcaataggga ctttccattg 7500 ggttttgccc agtacaaaag gtcaataggg ggtgagtcaatgggtttttc ccattattgg 7560 cacgtacata aggtcaatag gggtgagtca ttgggtttttccagccaatt taattaaaac 7620 gccatgtact ttcccaccat tgacgtcaat gggctattgaaactaatgca acgtgacctt 7680 taaacggtac tttcccatag ctgattaatg ggaaagtaccgttctcgagc caatacacgt 7740 caatgggaag tgaaagggca gccaaaacgt aacaccgccccggttttccc ctggaaattc 7800 catattggca cgcattctat tggctgagct gcgttctacgtgggtataag aggcgcgacc 7860 agcgtcggta ccgtcgcagt cttcggtctg accaccgtagaacgcagagc tcctcgctgc 7920 agcccgggtc tagaggatcc gcctgagaaa ggaagtgagctgtaaaggct gagctctctc 7980 tctgacgtat gtagcctctg gttagcttcg tcactcactgttcttgactc agcatggcaa 8040 tctgatgaaa tcccagctgt aagtctgcag aaattgatgatctattaaac aataaagatg 8100 tccactaaaa tggaagtttt tcctgtcata ctttgttaagaagggtgaga acagagtacc 8160 tacattttga atggaaggat tggagctacg ggggtgggggtggggtggga ttagataaat 8220 gcctgctctt tactgaaggc tctttactat tgctttatgataatgtttca tagttggata 8280 tcataattta aacaagcaaa accaaattaa gggccagctcattcctccag atccactagt 8340 tctagagcaa attctaccgg gtaggggagg cgcttttcccaaggcagtct ggagcatgcg 8400 ctttagcagc cccgctgggc acttggcgct acacaagtggcctctggcct cgcacacatt 8460 ccacatccac cggtaggcgc caaccggctc cgttctttggtggccccttc gcgccacctt 8520 ctactcctcc cctagtcagg aagttccccc ccgccccgcagctcgcgtcg tgcaggacgt 8580 gacaaatgga agtagcacgt ctcactagtc tcgtgcagatggacagcacc gctgagcaat 8640 ggaagcgggt aggcctttgg ggcagcggcc aatagcagctttgctccttc gctttctggg 8700 ctcagaggct gggaaggggt gggtccgggg gcgggctcaggggcgggctc aggggcgggg 8760 cgggcgcccg aaggtcctcc ggaggcccgg cattctgcacgcttcaaaag cgcacgtctg 8820 ccgcgctgtt ctcctcttcc tcatctccgg gcctttcgaccagcttacca tgaccgagta 8880 caagcccacg gtgcgcctcg ccacccgcga cgacgtccccagggccgtac gcaccctcgc 8940 cgccgcgttc gccgactacc ccgccacgcg ccacaccgtcgatccggacc gccacatcga 9000 gcgggtcacc gagctgcaag aactcttcct cacgcgcgtcgggctcgaca tcggcaaggt 9060 gtgggtcgcg gacgacggcg ccgcggtggc ggtctggaccacgccggaga gcgtcgaagc 9120 gggggcggtg ttcgccgaga tcggcccgcg catggccgagttgagcggtt cccggctggc 9180 cgcgcagcaa cagatggaag gcctcctggc gccgcaccggcccaaggagc ccgcgtggtt 9240 cctggccacc gtcggcgtct cgcccgacca ccagggcaagggtctgggca gcgccgtcgt 9300 gctccccgga gtggaggcgg ccgagcgcgc cggggtgcccgccttcctgg agacctccgc 9360 gccccgcaac ctccccttct acgagcggct cggcttcaccgtcaccgccg acgtcgaggt 9420 gcccgaagga ccgcgcacct ggtgcatgac ccgcaagcccggtgcctgac gcccgcccca 9480 cgacccgcag cgcccgaccg aaaggagcgc acgaccccatgcataggttg ggcttcggaa 9540 tcgttttccg ggacgccggc tggatgatcc tccagcgcggggatctcatg ctggagttct 9600 tcgcccaccc caacttgttt attgcagctt ataatggttacaaataaagc aatagcatca 9660 caaatttcac aaataaagca tttttttcac tgcattctagttgtggtttg tccaaactca 9720 tcaatgtatc ttatcatgtc tgtataccgt cgagatctagagcggccgcc accgcggtgg 9780 agctccagct tttgttccct ttagtgaggg ttaatttcgagcttggcgta atcatggtca 9840 tagctgtttc ctgtgtgaaa ttgttatccg ctcacaattccacacaacat acgagccgga 9900 agcataaagt gtaaagcctg gggtgcctaa tgagtgagctaactcacatt aattgcgttg 9960 cgctcactgc ccgctttcca gtcgggaaac ctgtcgtgccagggggtacc taggccgggc 10020 aacaattggc ggccggccgc acttttcggg gaaatgtgcgcggaacccct atttgtttat 10080 ttttctaaat acattcaaat atgtatccgc tcatgagacaataaccctga taaatgcttc 10140 aataatattg aaaaaggaag agtatgagta ttcaacatttccgtgtcgcc cttattccct 10200 tttttgcggc attttgcctt cctgtttttg ctcacccagaaacgctggtg aaagtaaaag 10260 atgctgaaga tcagttgggt gcacgagtgg gttacatcgaactggatctc aacagcggta 10320 agatccttga gagttttcgc cccgaagaac gttttccaatgatgagcact tttaaagttc 10380 tgctatgtgg cgcggtatta tcccgtattg acgccgggcaagagcaactc ggtcgccgca 10440 tacactattc tcagaatgac ttggttgagt actcaccagtcacagaaaag catcttacgg 10500 atggcatgac agtaagagaa ttatgcagtg ctgccataaccatgagtgat aacactgcgg 10560 ccaacttact tctgacaacg atcggaggac cgaaggagctaaccgctttt ttgcacaaca 10620 tgggggatca tgtaactcgc cttgatcgtt gggaaccggagctgaatgaa gccataccaa 10680 acgacgagcg tgacaccacg atgcctgtag caatggcaacaacgttgcgc aaactattaa 10740 ctggcgaact acttactcta gcttcccggc aacaattaatagactggatg gaggcggata 10800 aagttgcagg accacttctg cgctcggccc ttccggctggctggtttatt gctgataaat 10860 ctggagccgg tgagcgtggg tctcgcggta tcattgcagcactggggcca gatggtaagc 10920 cctcccgtat cgtagttatc tacacgacgg ggagtcaggcaactatggat gaacgaaata 10980 gacagatcgc tgagataggt gcctcactga ttaagcattggtaactgtca gaccctaggc 11040 cgggcaacaa ttggcggccg gccctgcatt aatgaatcggccaacgcgcg gggagaggcg 11100 gtttgcgtat tgggcgctct tccgcttcct cgctcactgactcgctgcgc tcggtcgttc 11160 ggctgcggcg agcggtatca gctcactcaa aggcggtaatacggttatcc acagaatcag 11220 gggataacgc aggaaagaac atgtgagcaa aaggccagcaaaaggccagg aaccgtaaaa 11280 aggccgcgtt gctggcgttt ttccataggc tccgcccccctgacgagcat cacaaaaatc 11340 gacgctcaag tcagaggtgg cgaaacccga caggactataaagataccag gcgtttcccc 11400 ctggaagctc cctcgtgcgc tctcctgttc cgaccctgccgcttaccgga tacctgtccg 11460 cctttctccc ttcgggaagc gtggcgcttt ctcatagctcacgctgtagg tatctcagtt 11520 cggtgtaggt cgttcgctcc aagctgggct gtgtgcacgaaccccccgtt cagcccgacc 11580 gctgcgcctt atccggtaac tatcgtcttg agtccaacccggtaagacac gacttatcgc 11640 cactggcagc agccactggt aacaggatta gcagagcgaggtatgtaggc ggtgctacag 11700 agttcttgaa gtggtggcct aactacggct acactagaaggacagtattt ggtatctgcg 11760 ctctgctgaa gccagttacc ttcggaaaaa gagttggtagctcttgatcc ggcaaacaaa 11820 ccaccgctgg tagcggtggt ttttttgttt gcaagcagcagattacgcgc agaaaaaaag 11880 gatctcaaga agatcctttg atcttttcta cggggtctgacgctcagtgg aacgaaaact 11940 c 11941 31 11216 DNA Artificial SequenceArtificial Sequence containing human UCOE elements and vector sequence31 acgttgtaaa acgacggcca gtgaattgta atacgactca ctatagggcg aattgggtac 60cgggcccccc ctcgaggtcg agttggggtg gggaaaagga agaaacgcgg gcgtattggc 120cccaatgggg tctcggtggg gtatcgacag agtgccagcc ctgggaccga accccgcgtt 180tatgaacaaa cgacccaaca cccgtgcgtt ttattctgtc tttttattgc cgtcatagcg 240cgggttcctt ccggtattgt ctccttccgt cgacggtatc aaggtggcga ccggaatggt 300gagctgcgag aatagccggg cgcgctgtga gccgaagtcg cccccgccct ggccacttcc 360ggcgcgccga gtccttaggc cgccaggggg cgccggcgcg cgcccagatt ggggacaaag 420gaagccgggc cggccgcgtt attaccataa aaggcaaaca ctggtcggag gcgtccccgc 480ggcgcgcggc aggaagccag gccccaaccc cctcccaacc gggcgccagc cccgcctccg 540cccggttcaa acagcgaccg ggtcgcgcgc gcgcacgcag cggccacacc ctcgggcgcc 600agcggctcgg gcaggaagtg gcgcaagcgc ccgggcccca gaacgcacgc gcgattagcg 660ccattgagtc ccagcgcgca cgcgcaatta gcgccaattc ccagcgcgca cgcagttagc 720gcccaaagga ccagcgcgca cgcgcatggc gccccagccc ccaccgggcc tgacgggggc 780tacgccgcgc ccaccgtgcg atccccattg gcaagagccc ggctcagaca aagaccccgc 840cggttgcccc cgccccgaga gcggcacccc cggagcgcgc ccgcccgagc gcggcctcgc 900gcctgcgaac tggcgtgggg tgtcccccat ctccggaggc ccaggggctt ctcccgcgcc 960ccccacggcg gtccggttcc gccccatgcg ccccccgctg cggcccagac ggcggctctg 1020cacgggcgaa gggccgcggc cgcatgcccc ggtcggctgg ccgggcttac ctggcggcgg 1080gtgtggacgg gcggcggatc ggcaaaggcg aggctctgtg ctcgcgggcg gacgcggtct 1140cggcggtggt ggcgcgtcgc gccgctgggt tttatagggc gccgccgcgg ccgctcgagc 1200cataaaaggc aactttcgga acggcgcacg ctgattggcc ccgcgccgct cactcaccgg 1260cttcgccgca cagtgcagca tttttttacc ccctctcccc tccttttgcg aaaaaaaaaa 1320agagcgagag cgagattgag gaagaggagg agggagagtt ttggcgttgg ccgccttggg 1380gtgctgggcc cgggggctgg gggcgcgcgc cgtggccccc gcgccccacg ctgggcagtg 1440cccggttcgg ccccgcatgg ccaggcctgc ccccggcctg cccgtctctc gggcccccca 1500cccaccgcgg gacatcctag gtgtggacat ctcttgggca ctgagcgccc aggtggggtg 1560ggccagggtc tgcacgggtg ccagggccct gggttctgta cgctcctgca gaaggagctc 1620ttggagggca tggagtggcc aggcagtcac tcccccttgc cgacttcaga gcaactgccc 1680tgaaagcagg gcctgaggac ctctggctgt ggggctcagc tagctaaatg tgctgggtgg 1740gtcactaggg agagacctgg gcttgagagg tagagtgtgg tgttggggga gtcaggtggc 1800ttgcggccat tagagtcgca ggaccacact ccccaggaca gggcaggggc cagcggtcca 1860gtggctggag gtggcccgtg atgaaggcta caaacctacc cagccgcagc cctgggaagg 1920aagtgggctc tacagggcag ggcacctttt accctggagc tgcctgcttt tgagggtaac 1980agtcacgccc agccaagacc aggcctgggg cgttagtggg tgacctaggc actgcggggc 2040gggggggctg ggtctacaca gcctgggtct gggcccaccg tccgttgtat gtctgctatg 2100cgcagccaca gctgaactgc cctcccagac catctggagg ccgctggggg actctgggga 2160ccaagactcc atgtgccaca gaggattggg ggcggggcgg tgctaggaac tcaaagccag 2220cctgggaaga ccctgtcctt gtcacccttt cttgccttgg gtctgtccac tgagtagcac 2280acaagaccgg gtgggcaggg tccgttctgc tccgggaatc acagactgtg tgtacccagg 2340tggtgggcat gcagcgatca gtggcgtggg accacagagg gggcccgcgg taccaagctt 2400gggaattgcg tgcaaaaaca acttctgttt tccagggtaa acagaatcta atgcagaatc 2460taatgcaggg taaacagact taatgcagaa tctaatgatg gcacaaatta aaaatcacta 2520acgtgccctt tttagtgtga aacccagaga gagcacatac aagccaaaaa caaatgcttt 2580attttaccta ggagacatta acattcacct ttacgtgttt aagattaatg caatgttaaa 2640tattgtgaaa actgtaactt tgaatttcat gatttttatg tgaatattcc agggtttaaa 2700aaaacttgta acatgacatg gctgaataag ataaaaaaaa aatctagcct tttctccctt 2760ctggctcata tttgcgattt cgatcatttt gtttaaaaaa caaaacactg caatgaatta 2820aacttaatat tcttctatgt tttagagtaa gttaaaacaa gataaagtga ccaaagtaat 2880ttgaaagatt caatgacttt tgctccaacc taggtgcaca aggtaccttg ttctttaaat 2940tgggctttaa tgaaaatact tctccagaat tctggggatt taagaaaaat tatgccaacc 3000aacaagggct ttaccatttt atgtaacatt tttcaacgct gcaaaaatgt gtgtatttct 3060atttgaagat aaaaatcctc agcaaaatcc acattgcact gtccttcaaa gattagcctt 3120ctttgaacta gttaagacac tattaagcca agccagtatc tccctgtaat gaattcgttt 3180ttctcttaat tttcccctgt aatttacact gggagagctg ggaaatatgt ggatgtaaat 3240ttctcagcca cagagatgca aagttatact gtggggaaaa aaaacttgag ttaaatcctt 3300acatatttta ggttttcatt aacttaccaa tgtagttttg ttggaggcca ttttttttat 3360tgcagacttg aagagctatt actagaaaaa tgcatgacag ttaaggtaag tttgcatgac 3420acaaaaaagg taactaaata caaattctgt ttggattcca acccccaagt agagagcgca 3480cactttcaaa cgtgaataca aatccagagt agatctgcgc tcctacctac attgcttatg 3540atgtacttaa gtacgtgtcc taaccatgtg agtctagaaa gactttactg gggatcctgg 3600tacctaaaac agcttcacat ggcttaaaat aggggaccaa tgtcttttcc aatctaagtc 3660ccatttataa taaagtccat gttccatttt taaaggacaa tcctttcggt ttaaaaccag 3720gcacgattac ccaaacaact cacaacggta aagcactgtg aatcttctct gttctgcaat 3780cccaacttgg tttctgctca gaaaccctcc ctctttccaa tcggtaatta aataacaaaa 3840ggaaaaaact taagatgctt caaccccgtt tcgtgacact ttgaaaaaag aatcacctct 3900tgcaaacacc cgctcccgac ccccgccgct gaagcccggc gtccagaggc ctaagcgcgg 3960gtgcccgccc ccacccggga gcgcgggcct cgtggtcagc gcatccgcgg ggagaaacaa 4020aggccgcggc acgggggctc aagggcactg cgccacaccg cacgcgccta cccccgcgcg 4080gccacgttaa ctggcggtcg ccgcagcctc gggacagccg gccgcgcgcc gccaggctcg 4140cggacgcggg accacgcgcc gccctccggg aggcccaagt ctcgacccag ccccgcgtgg 4200cgctggggga gggggcgcct ccgccggaac gcgggtgggg gaggggaggg ggaaatgcgc 4260tttgtctcga aatggggcaa ccgtcgccac agctccctac cccctcgagg gcagagcagt 4320ccccccacta actaccgggc tggccgcgcg ccaggccagc cgcgaggcca ccgcccgacc 4380ctccactcct tcccgcagct cccggcgcgg ggtccggcga gaaggggagg ggaggggagc 4440ggagaaccgg gcccccggga cgcgtgtggc atctgaagca ccaccagcga gcgagagcta 4500gagagaagga aagccaccga cttcaccgcc tccgagctgc tccgggtcgc gggtctgcag 4560cgtctccggc cctccgcgcc tacagctcaa gccacatccg aagggggagg gagccgggag 4620ctgcgcgcgg ggccgccggg gggaggggtg gcaccgccca cgccgggcgg ccacgaaggg 4680cggggcagcg ggcgcgcgcg cggcgggggg aggggccggc gccgcgcccg ctgggaattg 4740gggccctagg gggagggcgg aggcgccgac gaccgcggca cttaccgttc gcggcgtggc 4800gcccggtggt ccccaagggg agggaagggg gaggcggggc gaggacagtg accggagtct 4860cctcagcggt ggcttttctg cttggcagcc tcagcggctg gcgccaaaac cggactccgc 4920ccacttcctc gcccgccggt gcgagggtgt ggaatcctcc agacgctggg ggagggggag 4980ttgggagctt aaaaactagt acccctttgg gaccactttc agcagcgaac tctcctgtac 5040accaggggtc agttccacag acgcgggcca ggggtgggtc attgcggcgt gaacaataat 5100ttgactagaa gttgattcgg gtgtttccgg aaggggccga gtcaatccgc cgagttgggg 5160cacggaaaac aaaaagggaa ggctactaag atttttctgg cgggggttat cattggcgta 5220actgcaggga ccacctcccg ggttgagggg gctggatctc caggctgcgg attaagcccc 5280tcccgtcggc gttaatttca aactgcgcga cgtttctcac ctgccttcgc caaggcaggg 5340gccgggaccc tattccaaga ggtagtaact agcaggactc tagccttccg caattcattg 5400agcgcattta cggaagtaac gtcgggtact gtctctggcc gcaagggtgg gaggagtacg 5460catttggcgt aaggtggggc gtagagcctt cccgccattg gcggcggata gggcgtttac 5520gcgacggcct gacgtagcgg aagacgcgtt agtggggggg aaggttctag aaaagcggcg 5580gcagcggctc tagcggcagt agcagcagcg ccgggtcccg tgcggaggtg ctcctcgcag 5640agttgtttct cgagcagcgg cagttctcac tacagcgcca ggacgagtcc ggttcgtgtt 5700cgtccgcgga gatctctctc atctcgctcg gctgcgggaa atcgggctga agcgactgag 5760tccgcgatgg aggtaacggg tttgaaatca atgagttatt gaaaagggca tggcgaggcc 5820gttggcgcct cagtggaagt cggccagccg cctccgtggg agagaggcag gaaatcggac 5880caattcagta gcagtggggc ttaaggttta tgaacggggt cttgagcgga ggcctgagcg 5940tacaaacagc ttccccaccc tcagcctccc ggcgccattt cccttcactg ggggtggggg 6000atggggagct ttcacatggc ggacgctgcc ccgctggggt gaaagtgggg cgcggaggcg 6060ggaattctta ttccctttct aaagcacgct gcttcggggg ccacggcgtc tcctcggcga 6120gcgtttcggc gggcagcagg tcctcgtgag cgaggctgcg gagcttcccc tccccctctc 6180tcccgggaac cgatttggcg gccgccattt tcatggctcg ccttcctctc agcgttttcc 6240ttataactct tttattttct tagtgtgctt tctctatcaa gaagtagaag tggttaacta 6300tttttttttt cttctcgggc tgttttcata tcgtttcgag gtggatttgg agtgttttgt 6360gagcttggat ctttagagtc ctgcgcacct cattaaaggc gctcagcctt cccctcgatg 6420aaatggcgcc attgcgttcg gaagccacac cgaagagcgg ggaggggggg tgctccgggt 6480ttgcgggccc ggtttcagag aagatcccaa gcttattaat agtaatcaat tacggggtca 6540ttagttcata gcccatatat ggagttccgc gttacataac ttacggtaaa tggcccgcct 6600ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa tgacgtatgt tcccatagta 6660acgccaatag ggactttcca ttgacgtcaa tgggtggagt atttacggta aactgcccac 6720ttggcagtac atcaagtgta tcatatgcca agtacgcccc ctattgacgt caatgacggt 6780aaatggcccg cctggcatta tgcccagtac atgaccttat gggactttcc tacttggcag 6840tacatctacg tattagtcat cgctattacc atggtgatgc ggttttggca gtacatcaat 6900gggcgtggat agcggtttga ctcacgggga tttccaagtc tccaccccat tgacgtcaat 6960gggagtttgt tttggcacca aaatcaacgg gactttccaa aatgtcgtaa caactccgcc 7020ccattgacgc aaatgggcgg taggcgtgta cggtgggagg tctatataag cagagctggt 7080ttagtgaacc gtcagatcgg atccgcctga gaaaggaagt gagctgtaaa ggctgagctc 7140tctctctgac gtatgtagcc tctggttagc ttcgtcactc actgttcttg actcagcatg 7200gcaatctgat gaaatcccag ctgtaagtct gcagaaattg atgatctatt aaacaataaa 7260gatgtccact aaaatggaag tttttcctgt catactttgt taagaagggt gagaacagag 7320tacctacatt ttgaatggaa ggattggagc tacgggggtg ggggtggggt gggattagat 7380aaatgcctgc tctttactga aggctcttta ctattgcttt atgataatgt ttcatagttg 7440gatatcataa tttaaacaag caaaaccaaa ttaagggcca gctcattcct ccagatccac 7500tagtaattct gtggaatgtg tgtcagttag ggtgtggaaa gtccccaggc tccccagcag 7560gcagaagtat gcaaagcatg catctcaatt agtcagcaac caggtgtgga aagtccccag 7620gctccccagc aggcagaagt atgcaaagca tgcatctcaa ttagtcagca accatagtcc 7680cgcccctaac tccgcccatc ccgcccctaa ctccgcccag ttccgcccat tctccgcccc 7740atggctgact aatttttttt atttatgcag aggccgaggc cgcctctgcc tctgagctat 7800tccagaagta gtgaggaggc ttttttggag gcctaggctt ttgcaaaaag ctcccgggag 7860cttgtatatc cattttcgga tctgatcaag agacaggatg aggatcgttt cgcatgattg 7920aacaagatgg attgcacgca ggttctccgg ccgcttgggt ggagaggcta ttcggctatg 7980actgggcaca acagacaatc ggctgctctg atgccgccgt gttccggctg tcagcgcagg 8040ggcgcccggt tctttttgtc aagaccgacc tgtccggtgc cctgaatgaa ctgcaggacg 8100aggcagcgcg gctatcstgg ctggccacga cgggcgttcc ttgcgcagct gtgctcgacg 8160ttgtcactga agcgggaagg gactggctgc tattgggcga agtgccgggg caggatctcc 8220tgtcatctca ccttgctcct gccgagaaag tatccatcat ggctgatgca atgcggcggc 8280tgcatacgct tgatccggct acctgcccat tcgaccacca agcgaaacat cgcatcgagc 8340gagcacgtac tcggatggaa gccggtcttg tcgatcagga tgatctggac gaagagcatc 8400aggggctcgc gccagccgaa ctgttcgcca ggctcaaggc gcgcatgccc gacggcgagg 8460atctcgtcgt gacccatggc gatgcctgct tgccgaatat catggtggaa aatggccgct 8520tttctggatt catcgactgt ggccggctgg gtgtggcgga ccgctatcag gacatagcgt 8580tggctacccg tgatattgct gaagagcttg gcggcgaatg ggctgaccgc ttcctcgtgc 8640tttacggtat cgccgctccc gattcgcagc gcatcgcctt ctatcgcctt cttgacgagt 8700tcttctgagc gggactctgg ggttcgaaat gaccgaccaa gcgacgccca acctgccatc 8760acgagatttc gattccaccg ccgccttcta tgaaaggttg ggcttcggaa tcgttttccg 8820ggacgccggc tggatgatcc tccagcgcgg ggatctcatg ctggagttct tcgcccaccc 8880caacttgttt attgcagctt ataatggtta caaataaagc aatagcatca caaatttcac 8940aaataaagca tttttttcac tgcattctag ttgtggtttg tccaaactca tcaatgtatc 9000ttatcatgtc tgtataccgt cgagactagt tctagagcgg ccgccaccgc ggtggagctc 9060cagcttttgt tccctttagt gagggttaat ttcgagcttg gcgtaatcat ggtcatagct 9120gtttcctgtg tgaaattgtt atccgctcac aattccacac aacatacgag ccggaagcat 9180aaagtgtaaa gcctggggtg cctaatgagt gagctaactc acattaattg cgttgcgctc 9240actgcccgct ttccagtcgg gaaacctgtc gtgccagggg gtacctaggc cgggcaacaa 9300ttggcggccg gccgcacttt tcggggaaat gtgcgcggaa cccctatttg tttatttttc 9360taaatacatt caaatatgta tccgctcatg agacaataac cctgataaat gcttcaataa 9420tattgaaaaa ggaagagtat gagtattcaa catttccgtg tcgcccttat tccctttttt 9480gcggcatttt gccttcctgt ttttgctcac ccagaaacgc tggtgaaagt aaaagatgct 9540gaagatcagt tgggtgcacg agtgggttac atcgaactgg atctcaacag cggtaagatc 9600cttgagagtt ttcgccccga agaacgtttt ccaatgatga gcacttttaa agttctgcta 9660tgtggcgcgg tattatcccg tattgacgcc gggcaagagc aactcggtcg ccgcatacac 9720tattctcaga atgacttggt tgagtactca ccagtcacag aaaagcatct tacggatggc 9780atgacagtaa gagaattatg cagtgctgcc ataaccatga gtgataacac tgcggccaac 9840ttacttctga caacgatcgg aggaccgaag gagctaaccg cttttttgca caacatgggg 9900gatcatgtaa ctcgccttga tcgttgggaa ccggagctga atgaagccat accaaacgac 9960gagcgtgaca ccacgatgcc tgtagcaatg gcaacaacgt tgcgcaaact attaactggc 10020gaactactta ctctagcttc ccggcaacaa ttaatagact ggatggaggc ggataaagtt 10080gcaggaccac ttctgcgctc ggcccttccg gctggctggt ttattgctga taaatctgga 10140gccggtgagc gtgggtctcg cggtatcatt gcagcactgg ggccagatgg taagccctcc 10200cgtatcgtag ttatctacac gacggggagt caggcaacta tggatgaacg aaatagacag 10260atcgctgaga taggtgcctc actgattaag cattggtaac tgtcagaccc taggccgggc 10320aacaattggc ggccggccct gcattaatga atcggccaac gcgcggggag aggcggtttg 10380cgtattgggc gctcttccgc ttcctcgctc actgactcgc tgcgctcggt cgttcggctg 10440cggcgagcgg tatcagctca ctcaaaggcg gtaatacggt tatccacaga atcaggggat 10500aacgcaggaa agaacatgtg agcaaaaggc cagcaaaagg ccaggaaccg taaaaaggcc 10560gcgttgctgg cgtttttcca taggctccgc ccccctgacg agcatcacaa aaatcgacgc 10620tcaagtcaga ggtggcgaaa cccgacagga ctataaagat accaggcgtt tccccctgga 10680agctccctcg tgcgctctcc tgttccgacc ctgccgctta ccggatacct gtccgccttt 10740ctcccttcgg gaagcgtggc gctttctcat agctcacgct gtaggtatct cagttcggtg 10800taggtcgttc gctccaagct gggctgtgtg cacgaacccc ccgttcagcc cgaccgctgc 10860gccttatccg gtaactatcg tcttgagtcc aacccggtaa gacacgactt atcgccactg 10920gcagcagcca ctggtaacag gattagcaga gcgaggtatg taggcggtgc tacagagttc 10980ttgaagtggt ggcctaacta cggctacact agaaggacag tatttggtat ctgcgctctg 11040ctgaagccag ttaccttcgg aaaaagagtt ggtagctctt gatccggcaa acaaaccacc 11100gctggtagcg gtggtttttt tgtttgcaag cagcagatta cgcgcagaaa aaaaggatct 11160caagaagatc ctttgatctt ttctacgggg tctgacgctc agtggaacga aaactc 11216 3211105 DNA Artificial Sequence Artificial Sequence containing human UCOEelements and vector sequence 32 acgttgtaaa acgacggcca gtgaattgtaatacgactca ctatagggcg aattgggtac 60 cgggcccccc ctcgaggtcg agttggggtggggaaaagga agaaacgcgg gcgtattggc 120 cccaatgggg tctcggtggg gtatcgacagagtgccagcc ctgggaccga accccgcgtt 180 tatgaacaaa cgacccaaca cccgtgcgttttattctgtc tttttattgc cgtcatagcg 240 cgggttcctt ccggtattgt ctccttccgtcgacggtatc aaggtggcga ccggaatggt 300 gagctgcgag aatagccggg cgcgctgtgagccgaagtcg cccccgccct ggccacttcc 360 ggcgcgccga gtccttaggc cgccagggggcgccggcgcg cgcccagatt ggggacaaag 420 gaagccgggc cggccgcgtt attaccataaaaggcaaaca ctggtcggag gcgtccccgc 480 ggcgcgcggc aggaagccag gccccaaccccctcccaacc gggcgccagc cccgcctccg 540 cccggttcaa acagcgaccg ggtcgcgcgcgcgcacgcag cggccacacc ctcgggcgcc 600 agcggctcgg gcaggaagtg gcgcaagcgcccgggcccca gaacgcacgc gcgattagcg 660 ccattgagtc ccagcgcgca cgcgcaattagcgccaattc ccagcgcgca cgcagttagc 720 gcccaaagga ccagcgcgca cgcgcatggcgccccagccc ccaccgggcc tgacgggggc 780 tacgccgcgc ccaccgtgcg atccccattggcaagagccc ggctcagaca aagaccccgc 840 cggttgcccc cgccccgaga gcggcacccccggagcgcgc ccgcccgagc gcggcctcgc 900 gcctgcgaac tggcgtgggg tgtcccccatctccggaggc ccaggggctt ctcccgcgcc 960 ccccacggcg gtccggttcc gccccatgcgccccccgctg cggcccagac ggcggctctg 1020 cacgggcgaa gggccgcggc cgcatgccccggtcggctgg ccgggcttac ctggcggcgg 1080 gtgtggacgg gcggcggatc ggcaaaggcgaggctctgtg ctcgcgggcg gacgcggtct 1140 cggcggtggt ggcgcgtcgc gccgctgggttttatagggc gccgccgcgg ccgctcgagc 1200 cataaaaggc aactttcgga acggcgcacgctgattggcc ccgcgccgct cactcaccgg 1260 cttcgccgca cagtgcagca tttttttaccccctctcccc tccttttgcg aaaaaaaaaa 1320 agagcgagag cgagattgag gaagaggaggagggagagtt ttggcgttgg ccgccttggg 1380 gtgctgggcc cgggggctgg gggcgcgcgccgtggccccc gcgccccacg ctgggcagtg 1440 cccggttcgg ccccgcatgg ccaggcctgcccccggcctg cccgtctctc gggcccccca 1500 cccaccgcgg gacatcctag gtgtggacatctcttgggca ctgagcgccc aggtggggtg 1560 ggccagggtc tgcacgggtg ccagggccctgggttctgta cgctcctgca gaaggagctc 1620 ttggagggca tggagtggcc aggcagtcactcccccttgc cgacttcaga gcaactgccc 1680 tgaaagcagg gcctgaggac ctctggctgtggggctcagc tagctaaatg tgctgggtgg 1740 gtcactaggg agagacctgg gcttgagaggtagagtgtgg tgttggggga gtcaggtggc 1800 ttgcggccat tagagtcgca ggaccacactccccaggaca gggcaggggc cagcggtcca 1860 gtggctggag gtggcccgtg atgaaggctacaaacctacc cagccgcagc cctgggaagg 1920 aagtgggctc tacagggcag ggcaccttttaccctggagc tgcctgcttt tgagggtaac 1980 agtcacgccc agccaagacc aggcctggggcgttagtggg tgacctaggc actgcggggc 2040 gggggggctg ggtctacaca gcctgggtctgggcccaccg tccgttgtat gtctgctatg 2100 cgcagccaca gctgaactgc cctcccagaccatctggagg ccgctggggg actctgggga 2160 ccaagactcc atgtgccaca gaggattgggggcggggcgg tgctaggaac tcaaagccag 2220 cctgggaaga ccctgtcctt gtcaccctttcttgccttgg gtctgtccac tgagtagcac 2280 acaagaccgg gtgggcaggg tccgttctgctccgggaatc acagactgtg tgtacccagg 2340 tggtgggcat gcagcgatca gtggcgtgggaccacagagg gggcccgcgg taccaagctt 2400 gggaattgcg tgcaaaaaca acttctgttttccagggtaa acagaatcta atgcagaatc 2460 taatgcaggg taaacagact taatgcagaatctaatgatg gcacaaatta aaaatcacta 2520 acgtgccctt tttagtgtga aacccagagagagcacatac aagccaaaaa caaatgcttt 2580 attttaccta ggagacatta acattcacctttacgtgttt aagattaatg caatgttaaa 2640 tattgtgaaa actgtaactt tgaatttcatgatttttatg tgaatattcc agggtttaaa 2700 aaaacttgta acatgacatg gctgaataagataaaaaaaa aatctagcct tttctccctt 2760 ctggctcata tttgcgattt cgatcattttgtttaaaaaa caaaacactg caatgaatta 2820 aacttaatat tcttctatgt tttagagtaagttaaaacaa gataaagtga ccaaagtaat 2880 ttgaaagatt caatgacttt tgctccaacctaggtgcaca aggtaccttg ttctttaaat 2940 tgggctttaa tgaaaatact tctccagaattctggggatt taagaaaaat tatgccaacc 3000 aacaagggct ttaccatttt atgtaacatttttcaacgct gcaaaaatgt gtgtatttct 3060 atttgaagat aaaaatcctc agcaaaatccacattgcact gtccttcaaa gattagcctt 3120 ctttgaacta gttaagacac tattaagccaagccagtatc tccctgtaat gaattcgttt 3180 ttctcttaat tttcccctgt aatttacactgggagagctg ggaaatatgt ggatgtaaat 3240 ttctcagcca cagagatgca aagttatactgtggggaaaa aaaacttgag ttaaatcctt 3300 acatatttta ggttttcatt aacttaccaatgtagttttg ttggaggcca ttttttttat 3360 tgcagacttg aagagctatt actagaaaaatgcatgacag ttaaggtaag tttgcatgac 3420 acaaaaaagg taactaaata caaattctgtttggattcca acccccaagt agagagcgca 3480 cactttcaaa cgtgaataca aatccagagtagatctgcgc tcctacctac attgcttatg 3540 atgtacttaa gtacgtgtcc taaccatgtgagtctagaaa gactttactg gggatcctgg 3600 tacctaaaac agcttcacat ggcttaaaataggggaccaa tgtcttttcc aatctaagtc 3660 ccatttataa taaagtccat gttccatttttaaaggacaa tcctttcggt ttaaaaccag 3720 gcacgattac ccaaacaact cacaacggtaaagcactgtg aatcttctct gttctgcaat 3780 cccaacttgg tttctgctca gaaaccctccctctttccaa tcggtaatta aataacaaaa 3840 ggaaaaaact taagatgctt caaccccgtttcgtgacact ttgaaaaaag aatcacctct 3900 tgcaaacacc cgctcccgac ccccgccgctgaagcccggc gtccagaggc ctaagcgcgg 3960 gtgcccgccc ccacccggga gcgcgggcctcgtggtcagc gcatccgcgg ggagaaacaa 4020 aggccgcggc acgggggctc aagggcactgcgccacaccg cacgcgccta cccccgcgcg 4080 gccacgttaa ctggcggtcg ccgcagcctcgggacagccg gccgcgcgcc gccaggctcg 4140 cggacgcggg accacgcgcc gccctccgggaggcccaagt ctcgacccag ccccgcgtgg 4200 cgctggggga gggggcgcct ccgccggaacgcgggtgggg gaggggaggg ggaaatgcgc 4260 tttgtctcga aatggggcaa ccgtcgccacagctccctac cccctcgagg gcagagcagt 4320 ccccccacta actaccgggc tggccgcgcgccaggccagc cgcgaggcca ccgcccgacc 4380 ctccactcct tcccgcagct cccggcgcggggtccggcga gaaggggagg ggaggggagc 4440 ggagaaccgg gcccccggga cgcgtgtggcatctgaagca ccaccagcga gcgagagcta 4500 gagagaagga aagccaccga cttcaccgcctccgagctgc tccgggtcgc gggtctgcag 4560 cgtctccggc cctccgcgcc tacagctcaagccacatccg aagggggagg gagccgggag 4620 ctgcgcgcgg ggccgccggg gggaggggtggcaccgccca cgccgggcgg ccacgaaggg 4680 cggggcagcg ggcgcgcgcg cggcggggggaggggccggc gccgcgcccg ctgggaattg 4740 gggccctagg gggagggcgg aggcgccgacgaccgcggca cttaccgttc gcggcgtggc 4800 gcccggtggt ccccaagggg agggaagggggaggcggggc gaggacagtg accggagtct 4860 cctcagcggt ggcttttctg cttggcagcctcagcggctg gcgccaaaac cggactccgc 4920 ccacttcctc gcccgccggt gcgagggtgtggaatcctcc agacgctggg ggagggggag 4980 ttgggagctt aaaaactagt acccctttgggaccactttc agcagcgaac tctcctgtac 5040 accaggggtc agttccacag acgcgggccaggggtgggtc attgcggcgt gaacaataat 5100 ttgactagaa gttgattcgg gtgtttccggaaggggccga gtcaatccgc cgagttgggg 5160 cacggaaaac aaaaagggaa ggctactaagatttttctgg cgggggttat cattggcgta 5220 actgcaggga ccacctcccg ggttgagggggctggatctc caggctgcgg attaagcccc 5280 tcccgtcggc gttaatttca aactgcgcgacgtttctcac ctgccttcgc caaggcaggg 5340 gccgggaccc tattccaaga ggtagtaactagcaggactc tagccttccg caattcattg 5400 agcgcattta cggaagtaac gtcgggtactgtctctggcc gcaagggtgg gaggagtacg 5460 catttggcgt aaggtggggc gtagagccttcccgccattg gcggcggata gggcgtttac 5520 gcgacggcct gacgtagcgg aagacgcgttagtggggggg aaggttctag aaaagcggcg 5580 gcagcggctc tagcggcagt agcagcagcgccgggtcccg tgcggaggtg ctcctcgcag 5640 agttgtttct cgagcagcgg cagttctcactacagcgcca ggacgagtcc ggttcgtgtt 5700 cgtccgcgga gatctctctc atctcgctcggctgcgggaa atcgggctga agcgactgag 5760 tccgcgatgg aggtaacggg tttgaaatcaatgagttatt gaaaagggca tggcgaggcc 5820 gttggcgcct cagtggaagt cggccagccgcctccgtggg agagaggcag gaaatcggac 5880 caattcagta gcagtggggc ttaaggtttatgaacggggt cttgagcgga ggcctgagcg 5940 tacaaacagc ttccccaccc tcagcctcccggcgccattt cccttcactg ggggtggggg 6000 atggggagct ttcacatggc ggacgctgccccgctggggt gaaagtgggg cgcggaggcg 6060 ggaattctta ttccctttct aaagcacgctgcttcggggg ccacggcgtc tcctcggcga 6120 gcgtttcggc gggcagcagg tcctcgtgagcgaggctgcg gagcttcccc tccccctctc 6180 tcccgggaac cgatttggcg gccgccattttcatggctcg ccttcctctc agcgttttcc 6240 ttataactct tttattttct tagtgtgctttctctatcaa gaagtagaag tggttaacta 6300 tttttttttt cttctcgggc tgttttcatatcgtttcgag gtggatttgg agtgttttgt 6360 gagcttggat ctttagagtc ctgcgcacctcattaaaggc gctcagcctt cccctcgatg 6420 aaatggcgcc attgcgttcg gaagccacaccgaagagcgg ggaggggggg tgctccgggt 6480 ttgcgggccc ggtttcagag aagatcccaagcttattaat agtaatcaat tacggggtca 6540 ttagttcata gcccatatat ggagttccgcgttacataac ttacggtaaa tggcccgcct 6600 ggctgaccgc ccaacgaccc ccgcccattgacgtcaataa tgacgtatgt tcccatagta 6660 acgccaatag ggactttcca ttgacgtcaatgggtggagt atttacggta aactgcccac 6720 ttggcagtac atcaagtgta tcatatgccaagtacgcccc ctattgacgt caatgacggt 6780 aaatggcccg cctggcatta tgcccagtacatgaccttat gggactttcc tacttggcag 6840 tacatctacg tattagtcat cgctattaccatggtgatgc ggttttggca gtacatcaat 6900 gggcgtggat agcggtttga ctcacggggatttccaagtc tccaccccat tgacgtcaat 6960 gggagtttgt tttggcacca aaatcaacgggactttccaa aatgtcgtaa caactccgcc 7020 ccattgacgc aaatgggcgg taggcgtgtacggtgggagg tctatataag cagagctggt 7080 ttagtgaacc gtcagatcgg atccgcctgagaaaggaagt gagctgtaaa ggctgagctc 7140 tctctctgac gtatgtagcc tctggttagcttcgtcactc actgttcttg actcagcatg 7200 gcaatctgat gaaatcccag ctgtaagtctgcagaaattg atgatctatt aaacaataaa 7260 gatgtccact aaaatggaag tttttcctgtcatactttgt taagaagggt gagaacagag 7320 tacctacatt ttgaatggaa ggattggagctacgggggtg ggggtggggt gggattagat 7380 aaatgcctgc tctttactga aggctctttactattgcttt atgataatgt ttcatagttg 7440 gatatcataa tttaaacaag caaaaccaaattaagggcca gctcattcct ccagatccac 7500 tagttctaga gcaaattcta ccgggtaggggaggcgcttt tcccaaggca gtctggagca 7560 tgcgctttag cagccccgct gggcacttggcgctacacaa gtggcctctg gcctcgcaca 7620 cattccacat ccaccggtag gcgccaaccggctccgttct ttggtggccc cttcgcgcca 7680 ccttctactc ctcccctagt caggaagttcccccccgccc cgcagctcgc gtcgtgcagg 7740 acgtgacaaa tggaagtagc acgtctcactagtctcgtgc agatggacag caccgctgag 7800 caatggaagc gggtaggcct ttggggcagcggccaatagc agctttgctc cttcgctttc 7860 tgggctcaga ggctgggaag gggtgggtccgggggcgggc tcaggggcgg gctcaggggc 7920 ggggcgggcg cccgaaggtc ctccggaggcccggcattct gcacgcttca aaagcgcacg 7980 tctgccgcgc tgttctcctc ttcctcatctccgggccttt cgaccagctt accatgaccg 8040 agtacaagcc cacggtgcgc ctcgccacccgcgacgacgt ccccagggcc gtacgcaccc 8100 tcgccgccgc gttcgccgac taccccgccacgcgccacac cgtcgatccg gaccgccaca 8160 tcgagcgggt caccgagctg caagaactcttcctcacgcg cgtcgggctc gacatcggca 8220 aggtgtgggt cgcggacgac ggcgccgcggtggcggtctg gaccacgccg gagagcgtcg 8280 aagcgggggc ggtgttcgcc gagatcggcccgcgcatggc cgagttgagc ggttcccggc 8340 tggccgcgca gcaacagatg gaaggcctcctggcgccgca ccggcccaag gagcccgcgt 8400 ggttcctggc caccgtcggc gtctcgcccgaccaccaggg caagggtctg ggcagcgccg 8460 tcgtgctccc cggagtggag gcggccgagcgcgccggggt gcccgccttc ctggagacct 8520 ccgcgccccg caacctcccc ttctacgagcggctcggctt caccgtcacc gccgacgtcg 8580 aggtgcccga aggaccgcgc acctggtgcatgacccgcaa gcccggtgcc tgacgcccgc 8640 cccacgaccc gcagcgcccg accgaaaggagcgcacgacc ccatgcatag gttgggcttc 8700 ggaatcgttt tccgggacgc cggctggatgatcctccagc gcggggatct catgctggag 8760 ttcttcgccc accccaactt gtttattgcagcttataatg gttacaaata aagcaatagc 8820 atcacaaatt tcacaaataa agcatttttttcactgcatt ctagttgtgg tttgtccaaa 8880 ctcatcaatg tatcttatca tgtctgtataccgtcgagat ctagagcggc cgccaccgcg 8940 gtggagctcc agcttttgtt ccctttagtgagggttaatt tcgagcttgg cgtaatcatg 9000 gtcatagctg tttcctgtgt gaaattgttatccgctcaca attccacaca acatacgagc 9060 cggaagcata aagtgtaaag cctggggtgcctaatgagtg agctaactca cattaattgc 9120 gttgcgctca ctgcccgctt tccagtcgggaaacctgtcg tgccaggggg tacctaggcc 9180 gggcaacaat tggcggccgg ccgcacttttcggggaaatg tgcgcggaac ccctatttgt 9240 ttatttttct aaatacattc aaatatgtatccgctcatga gacaataacc ctgataaatg 9300 cttcaataat attgaaaaag gaagagtatgagtattcaac atttccgtgt cgcccttatt 9360 cccttttttg cggcattttg ccttcctgtttttgctcacc cagaaacgct ggtgaaagta 9420 aaagatgctg aagatcagtt gggtgcacgagtgggttaca tcgaactgga tctcaacagc 9480 ggtaagatcc ttgagagttt tcgccccgaagaacgttttc caatgatgag cacttttaaa 9540 gttctgctat gtggcgcggt attatcccgtattgacgccg ggcaagagca actcggtcgc 9600 cgcatacact attctcagaa tgacttggttgagtactcac cagtcacaga aaagcatctt 9660 acggatggca tgacagtaag agaattatgcagtgctgcca taaccatgag tgataacact 9720 gcggccaact tacttctgac aacgatcggaggaccgaagg agctaaccgc ttttttgcac 9780 aacatggggg atcatgtaac tcgccttgatcgttgggaac cggagctgaa tgaagccata 9840 ccaaacgacg agcgtgacac cacgatgcctgtagcaatgg caacaacgtt gcgcaaacta 9900 ttaactggcg aactacttac tctagcttcccggcaacaat taatagactg gatggaggcg 9960 gataaagttg caggaccact tctgcgctcggcccttccgg ctggctggtt tattgctgat 10020 aaatctggag ccggtgagcg tgggtctcgcggtatcattg cagcactggg gccagatggt 10080 aagccctccc gtatcgtagt tatctacacgacggggagtc aggcaactat ggatgaacga 10140 aatagacaga tcgctgagat aggtgcctcactgattaagc attggtaact gtcagaccct 10200 aggccgggca acaattggcg gccggccctgcattaatgaa tcggccaacg cgcggggaga 10260 ggcggtttgc gtattgggcg ctcttccgcttcctcgctca ctgactcgct gcgctcggtc 10320 gttcggctgc ggcgagcggt atcagctcactcaaaggcgg taatacggtt atccacagaa 10380 tcaggggata acgcaggaaa gaacatgtgagcaaaaggcc agcaaaaggc caggaaccgt 10440 aaaaaggccg cgttgctggc gtttttccataggctccgcc cccctgacga gcatcacaaa 10500 aatcgacgct caagtcagag gtggcgaaacccgacaggac tataaagata ccaggcgttt 10560 ccccctggaa gctccctcgt gcgctctcctgttccgaccc tgccgcttac cggatacctg 10620 tccgcctttc tcccttcggg aagcgtggcgctttctcata gctcacgctg taggtatctc 10680 agttcggtgt aggtcgttcg ctccaagctgggctgtgtgc acgaaccccc cgttcagccc 10740 gaccgctgcg ccttatccgg taactatcgtcttgagtcca acccggtaag acacgactta 10800 tcgccactgg cagcagccac tggtaacaggattagcagag cgaggtatgt aggcggtgct 10860 acagagttct tgaagtggtg gcctaactacggctacacta gaaggacagt atttggtatc 10920 tgcgctctgc tgaagccagt taccttcggaaaaagagttg gtagctcttg atccggcaaa 10980 caaaccaccg ctggtagcgg tggtttttttgtttgcaagc agcagattac gcgcagaaaa 11040 aaaggatctc aagaagatcc tttgatcttttctacggggt ctgacgctca gtggaacgaa 11100 aactc 11105

What is claimed:
 1. A composition for achieving high-level, large scaleprotein and/or polypeptide expression, said composition comprising: (a)an immortalized host cell-line, capable of continuous growth in culturewherein said host cell-line is capable of growth in serum-freesuspension culture, and (b) a vector for sustained overexpression of arecombinant protein and/or polypeptide, wherein said host cell-line istransfected with said vector.
 2. The composition of claim 1 wherein saidimmortalized host cell-line has a doubling time of no more than 16hours.
 3. The composition of claim 2 wherein said doubling time is nomore than 12 hours.
 4. The composition of claim 1 having an efficiencyof transfection of at least 70%.
 5. The composition of claim 4 whereinsaid efficiency of transfection is at least 75%.
 6. The composition ofclaim 4 wherein said efficiency of transfection is at least 85%.
 7. Thecomposition of claim 4 wherein said efficiency of transfection is atleast 95%.
 8. The composition of claim 1 wherein said host cell-line issusceptible to selection agents selected from the group consisting of:hygromycin, G418, and puromycin.
 9. The composition of claim 1 whereinsaid host cell-line is characterized by the absence of gal-galglycosylation of said recombinant protein and/or polypeptide.
 10. Thecomposition of claim 1 wherein said host cell-line is selected from thegroup consisting of CHO-S, 293-F, 293-H, COS-7L, D.Mel-2, Sf21, and Sf9.11. The composition of claim 1 wherein said vector further comprises aproperty selected from the group consisting of (a) containing one ormore elements that facilitate high-level, large-scale expression in theimmortalized host cell-line and (b) resistance to repression of therecombinant protein and/or polypeptide.
 12. The composition of claim 1wherein said vector further comprises one or more universal chromatinopening elements (UCOEs).
 13. The composition of claim 1 wherein saidcomposition is characterized in being capable of achieving expressionlevels of at least 50 mg recombinant protein and/or polypeptide perliter of culture.
 14. The composition of claim 13 wherein saidcomposition is characterized in being capable of achieving expressionlevels of at least 100 mg recombinant protein and/or polypeptide perliter of culture.
 15. The composition of claim 13 wherein saidcomposition is characterized in being capable of achieving expressionlevels of at least 200 mg recombinant protein and/or polypeptide perliter of culture.
 16. The composition of claim 1 wherein saidcomposition is capable of scale-up to at least 100 liter scale andwherein said composition is capable of yields of at least 1 gram ofprotein and/or polypeptide.
 17. The composition of claim 16 wherein saidcomposition is capable of yields of at least 10 grams of protein and/orpolypeptide.
 18. The composition of claim 16 wherein said composition iscapable of yields of at least 20 grams of protein and/or polypeptide.19. A method for the high-level, large-scale production of a proteinand/or polypeptide, said method comprising the steps of (a) obtaining animmortilized host cell-line capable of growth in suspension; (b)adapting said immortilized host cell-line for growth in serum-freemedium; (c) transfecting said serum-free growth adapted immortalizedcell-line with a vector suitable for high-level expression of arecombinant protein and/or polypeptide.
 20. The method of claim 19wherein said immortalized host cell-line has a doubling time of no morethan 16 hours.
 21. The method of claim 20 wherein said doubling time isno more than 12 hours.
 22. The method of claim 19 having an efficiencyof transfection of at least 70%.
 23. The method of claim 22 wherein saidefficiency of transfection is at least 75%.
 24. The method of claim 22wherein said efficiency of transfection is at least 85%.
 25. The methodof claim 22 wherein said efficiency of transfection is at least 95%. 26.The method of claim 19 wherein said host cell-line is susceptible toselection agents selected from the group consisting of: hygromycin,G418, and puromycin.
 27. The method of claim 19 wherein said hostcell-line is characterized by the absence of gal-gal glycosylation ofsaid recombinant of protein and/or polypeptide.
 28. The method of claim19 wherein said host cell-line is selected from the group consisting ofCHO-S, 293-F, 293-H, COS-7L, D.Mel-2, Sf21, and Sf9.
 29. The method ofclaim 19 wherein said vector further comprises a property selected fromthe group consisting of (a) containing one or more elements thatfacilitate high-level, large-scale expression in the immortalized hostcell-line and (b) resistance to repression of the recombinant proteinand/or polypeptide.
 30. The method of claim 19 wherein said vectorfurther comprises one or more universal chromatin opening elements(UCOEs).
 31. The method of claim 19 wherein said method is characterizedin being capable of achieving expression levels of at least 50 mgrecombinant protein and/or polypeptide per liter of culture.
 32. Themethod of claim 31 wherein said method is characterized in being capableof achieving expression levels of at least 100 mg recombinant proteinand/or polypeptide per liter of culture.
 33. The method of claim 31wherein said method is characterized in being capable of achievingexpression levels of at least 200 mg recombinant protein and/orpolypeptide per liter of culture.
 34. The method of claim 19 whereinsaid method is capable of scale-up to at least 100 liter scale andwherein said method is capable of yields of at least 1 gram of proteinand/or polypeptide.
 35. The method of claim 34 wherein said method iscapable of yields of at least 10 grains of protein and/or polypeptide.36. The method of claim 34 wherein said method is capable of yields ofat least 20 grams of protein and/or polypeptide.
 37. A bi-directionalvector for high-level, large-scale expression, of a multisubunit proteinand/or polypeptide, said composition comprising: (a) at least one UCOEelement; and (b) a first transcriptional promoter; and (c) a secondtranscriptional promoter; wherein said UCOE element is operably linkedto said first and said second transcriptional promoter and wherein saidfirst transcriptional promoter is oriented in the opposite direction assaid second transcriptional promoter
 38. The bidirectional vector ofclaim 37 wherein said UCOE element is an RNP UCOE.
 39. Thebi-directional vector of claim 37 wherein said first transcriptionalpromoter is selected from the group consisting of a human CMV promoter,a murine CMV promoter and a human beta-actin promoter.
 40. A compositionfor achieving high-level, large scale protein and/or polypeptideexpression, said composition comprising: (a) an immortalized hostcell-line, capable of continuous growth in culture wherein said hostcell-line is capable of growth in serum-free suspension culture, and (b)the bi-directional vector of claim 37, wherein said host cell-line istransfected with said vector.
 41. A method for the high-level,large-scale production of a protein and/or polypeptide, said methodcomprising the steps of (a) obtaining a host cell-line capable ofcontinuous growth; (b) adapting said host cell-line for growth inserum-free medium to create a cell-line capable of continuous growth inserum-free medium; (c) transfecting said cell-line capable of continuousgrowth in serum-free medium with a vector of claim
 37. 42. The method ofclaim 41 wherein said host cell-line capable of continuous growth isalso capable of growth in suspension.
 43. The method of claim 42 whereinsaid host cell-line capable of continuous growth in suspension is aCHO-S cell-line.
 44. A vector for high-level, large scale expression, ofa multisubunit protein and/or polypeptide, said composition comprising:(a) at least one UCOE element; and (b) a transcriptional promoter; saidvector further comprising one or more deletion within regions of the RNPUCOE selected from the group consisting of ΔBS, ΔEcoNI, ΔEM, ΔMluI, andΔRV as depicted in Table 4 and FIG.
 14. 45. The vector of claim 44wherein said deletion is within the region of the RNP UCOE depicted byΔBS in Table 4 and FIG.
 14. 46. The vector of claim 44 wherein saiddeletion is at least 100 bp.
 47. The vector of claim 44 wherein saiddeletion is at least 1,000 bp.
 48. The vector of claim 44 wherein saiddeletion is at least 4,000 bp.